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SOME INTERESTING FEATURES OF FREQUENCY CURVES 
By Richmond T. Zoch 

Introduction 

It is well known that in. the normal error curve the points of inflection are 
equidistant from^.the mode. However it has never been pointed out that this is 
also a characteristic of all of the bell-shaped Pearson Frequency Curves. This 
fact can be most easily shown by placing the mode at the abscissa a: = 0. 

Many rough checks have been developed for use in applying the Theory of 
Least Squares. The second part of this paper develops a rough check on the 
computation for use when fitting a Pearson Frequency Curve to a set of observa- 
tions. No rough checks on computation are given in textbooks on Pearson’s 
Frequency Curves. 

At present it is customary to follow a separate procedure for each Type of 
curve when computing the constants of a Pearson Frequency Curve. The 
third part of this paper shows how a single system may be followed for all Types. 
A single procedure is very desirable in order that the rough check of Part 2 may 
be quickly applied. 


Part 1. Points of Inflection. 

Perhaps nothing brings out the limitations of the bell-shaped Pearson Curves 
in a more striking manner than a discussion of their points of inflection. In 
dealing with frequency curves it is well known that any curve can be fitted to a 
given distribution and that the real problem in curve fitting is the selection of a 
curve. Figures 1, 2, and 3 illustrate three hypothetical histograms. Ail three 
of these histograms are bell-shaped yet none of them will be closely fitted by 
any of the Pearson Curves. The reasons will be pointed out presently. 

The differential equation from which Pearson derived his system of frequency 
curves is 

^ - P) 

dx biX^ -f ^ 1 * + ^0 ' 

By putting x — P = X, i.e. by placing the mode at the abscissa X = 0, this 
differential equation may be written; 

dy yX 

dX ± BiX ± BiX -H Bo 

where t^e -f or — sign is taken according to the type of the curve. (It will be 
shown later that the constant term of the denominator must be less than zero.) 
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Since in the Type III curve is 0 and in the “Norinal Curve” both Bz and 
are 0 it will be advantageous to consider the general case of 

% _ 

dX FiX) ’ 


where F{X) is an integral rational function of the n*'' degree, at once rather than 
considering special cases first. 


If 


dy yX . 
dX ~ FiX) ’ 


then 


d^y y 
dX^ ~ [F(X)]2 


~ znx)). 



In order to locate the points of inflection, is equated to zero. Then we have: 

aX‘ 


Xo 4- F(X) - XF'(X) = 0. 


( 1 ) 


This equation is always of the same degree as F(X) except when F(X) is linear or 
constant. Hence we have proved the Theorem: If j/ = G(X) be the solution 
of the differential equation 


dy _ yX 
dX ~ FiX)’ 

then the number of points of inflection of y cannot exceed the degree of FiX) 
when FiX) is of degree greater than one. 

Now PiX) = BnX” + + . . • 4 - B^X^ + BiX + Bo. Whence 

equation (1) can be written in the form: 

(1 - n)B„X» 4- 4- (3 - 4 - . 

4- (r -n)Bn-r^i X'‘-^» 4 - W^X* - 2B»X’ 4- (1 - BO X* 4 - Bo = 0 . 
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Hence we have established the Theorem: The coefficient of the linear term of 
X in the equation of the points of inflection is zero. 




For the "Normal Curve ” and also for Type III, 

Bt = Bi = Bi = • . • = Bn ~ 0 . 

Hence the points of inflection of these two Types are given by X = iky/ —Bo. 
For Types I and II, Bi is positive and Bs = Bi = • • • = JB„ = 0, and the 
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points of inflection are X = inflection are 

undefined if JSs == 1, are pure inaaginary if Ba > 1, and. real if fla < 1 . 

For Types IV, V, VI and VII, fla is negativ e and = • • . = = 

points of inflection are at Z = j 


. . . = jB„ = 0, and the 


In some of these Types it may happen that the abscissae of the points of 
inflection though real will lie beyond the range of the curve. Thus Types III 
and VI may have 1 or 2 points of inflection, the single point of inflection occur- 
ring when U — '^ °p- > the range of the curve in the direction that the range is 

y 1 -|- nj 

limited. Type 11 may have 0 or 2 points of inflection, as there will be no real 
points of inflection when ^2 § 1- Type I may have 0, 1 or 2 points of inflection. 
Types IV, V and VII as well as the “Normal Curve” always have 2 and only 
two points of inflection. 

Now it should be noted that when one of the eight bell-shaped Pearson curves 
has two points of inflection then the abscissae of these 2 points of inflection are 
equidistant from the abscissa of the mode. In figure 1 a point of inflection will 
be at abscissa h and another at abscissa a. {M is the abscissa of the mode ) 
Since b ~ M M — a none of the Pearson curves will fit this histogram closely, 
In figure 2, points of inflection occur at abscissae a, h, and c. Since a Pearson 
curve can have at most two points of inflection no Pearson curve will fit this 
histogram closely. In figure 3 there are four points of inflection and no Pearson 
curve will fit this histogram closely. 


Part 2. Range 

Definition: A bell-shaped curve is a continuous curve which starts at zero 
(or zero as a limit), rises to a single maximum, at which maximum point the 
first derivative is zero, and then falls to zero (or zero as a limit). 

Or, more formally, y = Oix) is a bell-shaped curve if G{x{} = G{xi) = 0 and 
if G'(P) = 0 and G"{P) < 0 where G(x) is continuous and does not vanish in. the 
interval from xi to xi and P is a unique point in this interval. 

If a hell-shaped curve has the value of zero at two finite points, one on each 
side of the maximum (mode), it is said to be of limited range in both directions, 
or briefly, of limited range. 

If a bell-shaped curve has the value of zero at only one finite point it is said 
to be of limited range in one direction, or also of unlimited range in one direction. 

If a bell-shaped curve has the value of zero only at ± oo , i.e. at no finite points, 
it is said to be of unlimited range in both directions, or briefly, of unbmited range. 

Theorem I: If F(x) can be separated into a finite number of factors each 
either of the form {x - r,) or (** j^2r,x + r] -f- rl) where no real root is 

repeated and y = G(x) is a bell-shaped curve which is a solution of the differential 
equation 
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dy y(x - P) 
dx F{x) ’ 

tlrenii P’fa:) has no real roots, y is of unlimited range in. toth dir^etion-s; if a.11 of 
th.e real roots of F(x) lie on the same side of P, y is of limited J'tijige inoae (that) 
direction; if at least one real root of F(x) lies on one side of P ani atl®®* ®e on 
the other side, y is of limited range in both directions. 

Proof: lfF(x) = 0 when x = P, we have 

dy ^ y 

dx g(x) 


where g(x) = F{x) -i- (x — P). This derivative is zero only vfTen == 0 or 
g(cr) == =t 00 , Hence the solution does not have a finite maxdrjiunn »nd therefore 
is not a bell-shaped curve. If F(x) > 0 when x = P, we have 



which is greater than zero and, since at a maximum the second d^eiiwative must 
not be greater than zero, in this case the solution ivonld have & itiinitnum at 
* = P and therefore would not be a bell-shaped curve, As the theorem concerns 
only those solutions which are bell-shaped curves, F(x^ < o wh.€n cr ® jP. K 

dv 

P(ar) = 0 when x 7^ P then ^ = ± eo unless y is also zero, i^ssuine 0. 

Since f{x) is negative, if i/ 0 when F{x) = 0 then — ~ cc, as P[x) 0, 

dx 

for aa j: > f , and changes to + 00 as F(x) changes sign on passing through the 
value 0, Hence the curve would contain another maximum before ialling to 
zero and therefore the solution is not a bell-shaped curve. SinnLlar reasoning 
holds for an r < P. Therefore if y 0 when F(cc) = 0, the cuwe not bell- 
shaped. If 1/ = 0 when F{x) = 0, the curve has its range limited at tlis point. 
That is, any real number which makes F{x) vanish will also make y radish if y 
represents a bell-shaped curve. Hence if all of the real roots lie cm the same side 
of Pthe curve is of limited range in that direction only, while if a, t least one of 
the real roots lies on each side of P the curve is of limited range m both direc- 
tions. If F{x) contains no real roots it does not vanish for any r'&al ralue of x. 
In this case, by partial fractions the differential equation becomes ; 


^ _ ^11 l_ ^21 dx - . 4- -j- d K 

y (» + ry + rl, (x -b r,)^ -f r*, ‘ 




+ • •• 
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On integrating, 

ic-fr, 

arc tan h ' * * 

3/ = C [(« + + roj**** [(a: + -r^f + • • • e ’'”1 

Hence y does not vanish for a finite real value of x and the Theorem is fully 
established. 

Theorem II: If F{x) can be separated into a finite number of factors each 
either of the from (cc - r 4) or (a:^ 4. ‘lr,x + r* + where no real root is repeated 
and y = G{x) is a bell-shaped curve which is a solution of the differential equation 

then if y is of unlimited range, Fix) contains no real roots; if y 

dx Fix) 

is of limited range in one direction, all of the real roots of Fix) he on the same 
(that) side of P-,ify is of limited range in both directions, at least one of the real 
roots of Fix) lies on one side of P and at least one on the other. 

Proof; By partial fractions the differential equation may be written: 


% 

y 


Icii dx k^i dx 

1 f. 

a: - rii x- r,. 


fcji dx 

ix -|- rji)* -p roj 


l:j 2 dx , 2kjiix 4 dx 2ik^^ix 4 dx 

(a: + + r?j + ^45)* + r's 


and on integrating : 

t **^^22 . 

arc tan i.* 4 . . . . 

1/ = C(a: - - r,j)*‘2 ••• [(a: + r4i)* + • 6 ''“i 

Hence y = 0 for a; = rn, ru, • • • and for no other finite values of x provided kn, 
kn, ’ • • are positive. ' If one or more of the are negative,' y = » at such 
points and unless some n, closer to P has previously made y vanish, the curve 
is not bell-shaped. Therefore, for bell-shaped curves, the exponent of the factor 
containing the real root of smallest absolute value on each side of P is positive. 
Therefore: if y is of limited range in both directions, at least one real root lies on 
each side of P; if y is of unlimited range in one direction, all of the real roots lie 
on the same side of P; if y is of unlimited range it contains no real roots. Hence 
the Theorem is established. 

The effect of repeated real roots will now be considered. If a real root is 
repeated an odd number of times at a; = r, then Fix) changes sign at a: « r 
and the first theorem is true. If a real root is repeated an even number of times 
at a: s= r, then Fix) does not change sign at a; = r and we know that either (a) 

y = 0 at X = r; pr (b) y is finite and 0 and ^ = ± » at x = r, i.e. there is a 

point of inflection at x = r. It will now be shown that (b) cannot occur. If 

case (b) is possible, y is continuous at x = r, ^ = ± <» according as (r - P) ^ 0 
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moreover — does not change sign m the neighborhood of the point x = r, and 
dx 

^ changes sign from 4- oc to — <» or vice versa according as (r - P) $ 0 . 
Now 

(a; _ py 4_ F(x) _ (x - P) ^ P(a;)] . 


d^y y 
dx^ [P(x)]2 


d'^v . ■ ■ 

Whence if y is finite and 7^ 0 , ^ does not change sign at a; = r because it is 

dx^ 

possible to select a neighborhood such that 


{x -PY\> 


F{x) -{x-P)± Fix) 


for an x differing from r by « where e is a small positive quantity. Therefore 
ease (b) is not possible and y = 0 when a real root is repeated an even number of 
times. That is to say the range of the curve is limited at a point where a real 
root is repeated an even number of times. Thus Theorem I always holds for 
repeated roots. 

Tor Theorem II it is clear that this Theorem holds for repeated roots when a 
non-repeated root lies closer to P, and on the same side, than the repeated root. 
Suppose that the repeated root is the nearest root to P (on a given side of P). 
Then by partial fractions : 


dy 


fell dx 


+ 


y {x - rii) ix - ru)2 
fcg, dx 


fcij dx . fc,3 dx 

' 7Z “T 


ix - rii)» 


+ 


kio dx 


+ 


+ 


+ 


dx 


(* + ^2i)* + ^oi (^ + r22)* + ro 


+ 


^41 d,x 
ix — Ui) ix — Ut) 

2^3] (x -|— Til) dx 
+ r2:)“ + ro. 


+ 


and on integrating: 

y = dx - rn)’‘"ix - rn)'‘*'ix - ■ [(a; + 


^21 


»+’-81 

’■oi 






Hence y can = 0 only for x = rn or- for x = r^, rn, . • . and for no other finite 
values of x. Since by hypothesis y is bell-shaped, then the proper k „ must be 
positive and Theorem II always holds for repeated roots. 

Theorems I and II can now be combined and generalized in the form : 

Theorem: If P(x) is a polynomial with real coefficients and y = 6(0:) is a 
bell-shaped curve which is a solution of the differential equation 

dy yix - P) 
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then the necessary and sufficient condition : that y be of unlimited range in both 
directions is that F{x) have no real roots; that y be of limited range in one direc- 
tion is that all of the real roots of F{x) lie on the same side of P; that y be of 
limited range in both directions is that at least one real root of F{x) lie on one 
side of P and one on the other. 

CoEODDARY : P{x) must be negative throughout the range of y. 

Suppose now that we have some statistics which we wish to graduate and the 
statistics are of such nature that we would expect a bell-shaped curve, rather 
than a J- or XJ-shaped curve, and we desire the best fit: If we use a curve which 
is a solution of the differential equation 

^ - P) 

dx F(z) 

(the Pearson Curves being special cases) to fit the statistics and if in computing 
the constants for the curve one of the following cases arise : 

(a) bi < 0 when this constant is computed, 
or (b) Bo < 0 when the origin is moved to the mode, 
or (c) a root is located within the range of the statistics then it means that ; 

1. A mistake may have been made in the computation: thus the Theorem 
just established provides a rough check on the work of computation, 

2. If no mistake has been made in the computation it may indicate that the 
bell-shaped Pearson Curves will not closely fit the statistics and that some 
other graduation curves be used, e.g. the Gram-Charlier Types A or B might be 
tried, 

3. If no mistake has been made in the computation it may happen that one 
of the bell-shaped Pearson Curves will give an excellent fit but a different method 
than or a modification of the Method or Moments should be used in order to 
compute the constants. 

Part 3. Computing the Constants 

At present, the constants of a frequency curve are computed as follows; 
First the moments are computed about an arbitrary origin, then the moments 
about the A.M. are determined, then /3i and and the criterion are computed, 
after which the type of curve can be selected. From this point a separate 
procedure is followed for each curve. Now in the above method one will not 
know whether a root has been located in the range of statistics or not. 

Take Pearson's differential equation 

^ ~ P) 

dx bzz'^ -f bix 6o ‘ 

Put X = X — P. Then dX = dx and x = X + P, and 

^ = yX 

dx h{X + Py -f- bi(X + P)^ba biX^ -f -f- biX + P %2 -f- Ph + bo ' 
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Now put 


— -Bs 


2Pbi + bj = Bi 

P'^bi -|- Pbi -f* bo = Bo . 

Then we have 

dy yX dy _ yix - P) . . 

dX ~ BoX2 + BiX + Bo dx B,{x ~ Py + B,{x - P) -f Bo ' ^ ^ 


It should be noted that for a particular curve, B2, Bi and Bo are constants; 
i.e., their values do not change with a change of the origin. The values of bi 
and 60 do change with a change in the origin. 

If we clear equation ( 1 ) of fractions, multiply by e’* and integrate with respect 
to X over the range from X] to x^, where 


\\fi 4 - 
e 





e''® ydx , 


then successively differentiate with respect to ri, and equate cocfBcionts of 
like powers of tj, we finally obtain : 

Xi — B -f- Bi “ 2P Bi -f- 2B2X1 = 0 , 

Xj -p Bo — PBi -j- P^Bi -f- BjXi — 2PBi\i -j- 3B2X2 -|~ B2\j = 0 , 

(2) 

X3 -T 2X2B1 — 4 P B2X2 -)- 4B2X3 -j“ 4B2 XiX2 = 0 , 

X4 + 3B1X3 - 6PB2X3 + 5B2X4 + 6B2X2 + 6B2X1X3 = 0 . 

Since we can compute the moments from the raw statistics and the semi- 
invariants from the moments, we may regard X2, X3 and X4 in these equations as 
knowns and the Bq, Bi, Bi, P and Xi as unknowns. But the origin has not yet 
been specified. Let the origin be placed at the A.M. where = Xi = 0. As 
X2, Xs, X4, Bo, Bi and B2 are unchanged by a change of origin, we have: 

Bi - Po - 2P0B2 = 0 . 

X2 -j- Bq — PqBi P0B2 -j- 3B2X2 = 0 , 

( 3 ) 

X3 ” 1 “ 2B1X2 — 4 P 0B2X2 -|- 4B2X3 = 0 , 

X4 -f- 3B1X3 — 6P0B2X3 j'i=!^ 6 B 2 X 4 -(- 6B2X2 — 0 . 


Now put 
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then 

b'l — Po = 0 , 

”(■ ^0 "1“ S^a^a = 0 , 

Xg "{“ 26 j^X 2 “f* 463X3 == 0 j 
Xg "(- 861X3 "f* 563X4 663X1 = 0. 

By reversing the transformation (4) we get; 

Pa = 6;, I 

Bi = bl + 2Po6^ y (6) 

Bo = 6^ + Poib[ + PoK ) . J 

Now the above theory suggests the following procedure for computing the 
constants of a frequency curve: First the moments are computed about an 
arbitrary origin, then the semi-invariants are computed (or alternatively the 
moments about the A.M., either step involves about the same amount of work), 
then the equations (5) are solved and then by means of equations (6) the Bs, 
Bi and Bq are computed. Next solve the quadratic equation 

BiX^ BiX -f- Bo = 0. 

The character of the roots of this equation indicates which type to use and it is 
unnecessary to compute the criterion. The constants of the frequency curve 
are simple functions of the roots of the above quadratic equation and can be 
readily found by integrating the diff. eq. (1) being careful to write the solution 
as a function of X = x — P. The rough checks mentioned in Part 2 can be 
quickly and conveniently applied when this procedure is followed. 

George Washington Universitv. 




A RECONSIDERATION OF SHEPPARD’S CORRECTIONS 
By W. T. Lewis’ 

In computing the moments of a frequency distribution it is customary to find 
first what are known as the raw moments. These are obtained on the assump- 
tion that all the material of each class interval is concentrated at the middle 
point of the interval. It introduces what is called a grouping error because in 
fact the material does not all lie at the middle point. To compensate for this 
error W. F. Sheppard® derived a set of corrections. The hypothesis underlying 
his method is that the distribution may be regarded as similar to one to which 
the Euler-MacLaurin summation formula without its end terms may be applied. 
He presupposed such a curve, found its true moments, and then the raw moments 
that would be obtained if its area were concentrated at several equidistant 
abscissae. The relationship between these raw moments and the true moments 
of the curve furnished him with the corrections required for that distribution. 
If now our observed distribution may be supposed to be sufficiently like that one, 
we may use his corrections also on the observed data. One may note four points 
of criticism. 

(1) The given distribution may not be similar to the one suggested, in the 
sense that it would be close to such a curve if the intervals of grouping were 
made very small; or at all events the purpose of finding the moments may be in 
part to decide whether or not i]b would become such a curve, and so one would 
not like to assume that to be true at the outset. A special case of importance 
in which this last is true occurs when one is finding the moments of a sample in 
order to determine whether it may have been drawn from a presupposed universe. 
It is inexact to use raw moments but it is illogical to use corrections that have 
been proved only for the universe being tested. 

(2) Sheppard’s argument does not make use of the one certain fact that is 
given in the hypothesis, viz: that the partial area of the given distribution over 
each class interval is exactly as stated. In' fact, if, following the argument of 
some authors, the given curve be assumed to be exponential, it obviously cannot 
have partial areas everywhere exactly equal to the several given frequencies, 
for in particular its partial area is not zero beyond the given range. 

(3) It is common to find distributions which do not have high contact at the 
ends of the range and for them Sheppard’s corrections certainly fail. To 
obviate this criticism new corrections have been derived by Pairman and Pear- 

' With the assistance of Burton H. Camp. 

* The true values are given on page 220 of “Mathematical Part of Elementary Statistics, 
by Camp, D. C. Heath and Company, 1931. 
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son for the so-called abrupt cases. These new corrections are adequate to care 
for the abrupt cases but involve so much computation that it is a fair question 
whether it would not be simpler, first to distribute the given material over each 
interval by a smoothing process, and then to find without corrections the 
moments of the smoothed distribution. 

(4) Even if one admits Sheppard’s method in general, waiving the dubious 
question as to whether it is proper to start with an assumed curve instead of 
starting with the given distribution, it is doubtful whether there are any curves 
which have exactly the properties required. The high contact hypothesis may 
be put in different language as follows: using the notation of the Handbook’ 
page 92, let f(x) be the curve and xi be the middle point of the slice. It is 
assumed that 

2 cxlf'\x,) = I x'f'\x) dx] I = 0, 1, ■ • • ; r = 0, 1, • • • ; 

c being the class interval. This means that if the moments of the curve be 
found by using mid-ordinates times class interval, instead of areas, one will obtain 
exactly the true moments of the curve, and that this will remain true for all the 
curves which are derivatives of this curve. This property is certainly not true 
of the normal curve; but it is almost true when r and the class interval are both 
small, and it is probably due to this fact that Sheppard’s corrections seem to be 
good in practice. 

Moreover, this high contact hypothesis cannot be true for any function over a 
limited range if the function is developable in Taylor’s series about one end of the 
range. For the only function which has the required properties is identically 
zero, since the function and all its derivatives are required to vanish at that end 
of the range. 

The primary purpose of this paper, therefore is to derive corrections similar to 
Sheppard’s with a different set of assumptions. The results may be used as an 
approximate substitute for both Sheppard’s and Pairman’s. That is, they will 
apply approximately to both extreme cases and to the intermediate cases ; on the 
whole they give better results than Sheppard’s and are not so difficult to admin- 
ister as Pairman’s. 

The argument runs as follows. When a distribution is given merely by class 
intervals, there is no way of knowing exactly what the distribution would have 
been had the class intervals been smaller; we do not know that we have a sample 
from an exponential curve, and even if we did we would noj know that this 
sample would lie close to the exponential in form. We shall, however, try to 
draw a graduating curve in such a manner that (a) its partial area over each class 
interval will equal the frequency of the given distribution over that interval; 
and (b) its form within each class interval will be such that it will pass smoothly 
into the adjacent portions to the right and left. A good way to do this is by a 

> H. L. Rietz, "Handbook of Math. Stat " Houghton Mifflin Co. (1924). 
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freehand graph, frankly recognizing that there are many forms that will do 
equally well. To obtain a numerical result it is necessary to use the equation 
of some curve. Again frankly recognizing that there are many types which 
will do equally well we choose the simplest to handle ■ 

y — a -{-ht (P. 

Let the relative frequency distribution be defined by/(i), ~m ^ t ^ n, m,n,i 
being integers. To satisfy (a) we have the equation 

/ ^ ydt = /(i) . 



To satisfy (b) we shall let 

y = il/W + + 1)] if i f 

The latter will hold for all values of i from — m to n — 1 inclusive, but the pnd 
intervals require special treatment. Here in order to satisfy as well as possible 
both the high contact and the abrupt cases, we wish to let the material be 
distributed according to the way the curve is behaving over the two nearest 
intervals on the right (at n) or left (at ~m) rather than by the addition of zero 
frequencies beyond the given limits. To do this we let the slope of the para- 
bolas be zero at the extremes ; 


dt 


= 0 


at t = —TO — ^ and t = n + ^. 
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Then, if for example the frequencies are increasing as one nears the right end 
interval, the curve will rise over the right end interval; if they are decreasing, 
it will fall. These three conditions are sufficient to determine a continuous 
curve of the sort indicated in the figure. The exact moments of the curve may 
be found by integration and expressed in terms of the raw moments. The 
details are tedious and of an elementary nature and will be given only for the 
mean value H. 

To determine the coefficients of the parabola y — a hi ct^ for the rectangle 
sXt = i we may write the following three equations; the first complying with the 
requirement that the area under the parabola from t = i — \iot — equals 
the area of the rectangle at i = i, the second and third giving the ordinates at 
i — Jandi | respectively: 

/(z) = / (a -f + cP)dt , 

/M+/(j + i ) _ „ ^ j (,■ + j) ^ ^ 

. fW . + g- » . a + 6 - 4) + c (i _ 4)>. 

Solving these three simultaneous equations we get for a, b, and c: 

a. a- 3.-.) /(.) + (I’ - • - 0 /« + 1) + + i - i) f(i - 1) , 

h = 6i/(i) + (^ - 3f)/(i + 1) _ (I + 3j)/(i - 1) , 

c = — 3/('i) §/(f + 1) t/(^' — 1) j 

and these hold for — m -f 1 ^ f m — 1. 

For the parabola y = ai bit at^ over the first rectangle, i.e., where 
i = —m, we get the equations: 

f-m+i 

/(- m)= (ai + bit -{■ Cl P)dt , 

J— m— I 

f{— m) 4- /(— m 4- 1) / 

— 2 ~ "h m-f- i)® } 

hi + 2ci (- m - J) =0, 
and their solutions: 

“1 = I (m? + m - tV)/(- m + 1) - J (m» + m - U)fi~ m) , 

hi = I (2w 4 !)/(_ fft + 1) - f (2m 4 !)/(_ m) , 

Cl l/(- m4 1) - f/(-m). 
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Similarly for the parabola 2 / = a„ + b„i -)- c„t^ through the last rectangle at 
I = nwe get 

fn+i 

/(n) = / (a„ + bj + c„P)di , 


M± ji’ L^ n . ^ i., + .. („ _ j). , 

bn + 2 c„n -f c„ = 0 , 
and for the constants 

= I + w — 1 ^) f{n — 1) — ? (n^ + n — ■^) /(n) , 
b„ = - I (1 + 2n)/(n - 1) + f (1 ^ 2n)/(n) , 
c„ = f /(n - 1) - f /(«) - 

Having obtained the constants for the graduating curve we will determine 
the moments of this curve in terms of those of the given frequency distribution. 

n 

Notation: Let the class interval be c = 1 ; let r, = ^ i’ f(i) be the uncor- 
rented s**' moment of the given frequency distribution about the given origin; 

n 

let ^ O' “ *'i)'/0) be the uncorrected s‘'' moment of the given fre- 

% «■— m 

quency distribution about its uncorrected mean; let P, be the corrected value of 
the moment about the given origin; and let fi, be the corrected value of 
the s‘'’ moment about the corrected mean. Thus v, and n, apply to the rec- 
tangles, and V, and it. apply to the curves as follows: 

^ /•»+! r-m+t 

P, = 2j / + cf)dt + / t‘(fli -h hit -f- CiV)dt 

•''-i J-m-i 

/•"+» 

+ I t*(®n "h b„t -h , 

Jn-i 

it. == I ~ + ct^)dt -f 1 (t — h)‘ (ai -f bit •+• ci{®)d< 

J-m-l 

-{-I (t — Pi)* (®n + bnt -}- Cnt^)dt . 

Jn-i 

Using these symbols we have for the first moment about the given origin: 


Pi 


" ^ r »+i 

/] I t{a + 6< -f ct^)dt + 

m+l ■'•“1 

/•n+l 
+ / 


r-m+i 

I t(ai + bit -\-Cit^)dt 
J—m—i 

ticin + bnt + Cnf)dt 
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+ 


= S ^ + ^) + " + i) 

-m+1 \ \ / 

aim + bi (m^ + + x) 

+ ttnU bn 4^ 

Substituting the values for the constants this becomes 

(zi' 

\2 


+ 


32 * i 

I 2 



+ + tV) [6i/(i) + a - Si ) f(i + 1) - (^ + 3 f) f{i - 1)] 

+ (^’ + 0 1- + 1) + - 1)^} 


+ ( - W + W - xV) /(— W + 1) — I ("I* + »» — i4) /(— ”^)1 

+ (m* + 1 ^) [| (2m + l)/(— w 4- 1) — i ( 2 ni- -+- 1) /(— m)] 

- + 1) «* + 1) - f /(- m)]| 

+ {n[i (n* 4 n - A)/(n _ 1 ) _ f (n* 4 n - il)/(n)] 

+ + 4 ) [" I -1 ) + I (1 4 2 n)/( 7 i)] 


/(- Tn4l) 


/(«)■ 


" ^ - 1) + (n 4 

n-~l n 

^ */(*) = ^ i/(i) — ( ~ m) /( — m) — n/(n) = vj 4 mfi ~ m ) — nf ( n ) . 

-TW+l — »» 


» ■*—»»+ 1 
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1 ) 


I =*— m4- 1 


4 § - a'^<" - 


-k- 2V<” - 


Vi 


24 


= vi + m/(- m) - nfin) + ^ - w + 1) - i/(- m) 

+ - 1) + ^f(n) + m + 1) 

-(m+ - 1) + + ^Kn) . 

j*! = n — ^ y(— ^) + |g /W + ^ + 1) — ~ • 

Using this same notation and method for the higher moments we get 
* = >'2 - ^ - Pj + (^ + ^)/(- w) + fin) 

(^- 2^)-^^- + + (ii - 

[-5 2 21 

L 16 80 " 

+/(- Wi + 1) 

+ fin - 1) 


+ 


fig = V3 ~ 3vm — -^ — pI + /(— m) 


+ fin) 


{re 


2 , 21 ,17 

n® 4 n 4 

^80 ^ 120 




17' 

120 


m 1 

16 ^ m 




—n-' 

le' 


M “ IM 


fii = Vt ~ ifigVl — 6 ^ 2 ?! — ~ ^ 


p! 


u 

64 


'5m* 

, 21m* , 

17m 

313' 

, 5n* 

, 21n* 

, 17n 

, 3131 

LT2 

+ 40 + 

30 

1680. 

+ /(n)|_y2 

+ '40' 

+ M 

168oJ 


+ /(-m)|^ 


12 


■ 40 “ M m 


;36. ■ 


SPECIAL CASES 

The above formulae are rather long and in practice the special cases below 
will frequently be preferred. 

(a) We may usually take the origin at or very near the middle of the range so 
that m = n,&t least approximately. 
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If m = n: 

+ (it ■" 

JU3 = V3 - sm - J - 5? + w ■*■ It ~ 

,. i. _ ft- -1 -4 M2 1'^ 

Hi s=^ Vi — ^HiVi — o/ijJ'i 

+ [^ -S -I -3ie>- + «+«»-'« 

(6) Except in the abrupt cases the end frequencies and the difference between 
those next to the ends will be so small (relative to unity) that they will have a 
negligible effect on the corrections. If wi = w as in (a), and if also 

/(_m) - /(n) = 0 and/(— w + 1) — f{n - 1) = 0: 

h = v\. 

ils = % — 3 PiM2 — j — Pi . 

fii=Vi~ 4m3Pi - eiijP? -v\ - ^- ''-1 - ^ 




These formulae have been written in tlie form which makes the computing 
simple. The following makes a comparison with Sheppard’s corrections easy 



EECONSIDEEATION OF SHEPPAED’S COBRBCTIONS 


19 


Vi = Vi . 

M2 = Ma 


12 

Ms = JU3 + V\ 


j2+/{ ”* + ^{12 120I' 

(t + sV' -“ + '>• 


M4 = m 


tl 

2 


43 

192 


+ f(-m + 1) 






m 


20 40 560 


mj'i 

~2' 


20 


The following special case is also useful in comparing my formulae with 
Sheppard's. 

(c) Let/(— »i) = i f(—m + 1) and /(n) = J /(»i — !)• This produces a 
graduating curve which is exactly tangent to the t-axis at the ends of the range 
and is everywhere continuous — though it does not have continuous derivatives 
at certain isolated points. It is, however, a curve which to the eye cannot be 
distinguished from the type assumed in the Euler-MacLaurin theorem, which 
lies at the base of Sheppard’s formulae. My corrections become: 


Vi = Vl 

M2 


M 2 = M 2 - ^ ^ Ui-m) 4- fin )] , 


M3 = Ma ~ g [f( — ni) + 


—m 1 

'~5 10 


M2 43 
"2 192 


f(-m) + 

+ g j^(’'l + Vl -\- m -{■ 


+ 


i + ' 

m) 

29 


ivi — n)^ — Vl + n + 


84 


fin ) . 


Sheppard’s are: 


Vl = Vl, 

M2 = M2-^. 


M3 = M3 , 


M2 . 7 

2 240 ■ 


Let us compare my results with Sheppard’s in the very special case in which 
fi—m) = fin) — 1/7, /(O) = 5/7, w = n = 1. The odd moments vanish. 
My corrections for M 2 and m 4 are 


j52 = 0 2214, 


M4 = 0.1870. 
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Sheppard's are 

Mj = 0.2024, M4 = 0.1720. 

The numerical difference between the juj’s is 0 . 0190, and the numerical difference 
between the fiiS is 0 . 0150. 

This example shows that Sheppard's corrections are not valid to the precision, 
to which they are usually given if they are to be used for the purpose of correcting 
raw moments. The last term in the fourth moment correction, 7/240, might 
equally well be, for example, —43/192 as in my special case. This will become 
more evident to the reader if he will draw the curve indicated in this example. 
To the eye it will appear exactly like the kind specified in the Euler-MacLaurin 
theorem; for example, much like the normal curve. Now suppose one adopted 
for the moment the point of view (which I have criticized earlier) of starting 
with the curve used in this example, breaking it up into three partial areas and 
then finding the relation between the true and the raw moments. The partial 
areas found would be exactly those used in this example and this method would 
give us Sheppard’s corrections, but they would not be exactly correct, for in this 
instance my formulae give exactly the relationship between the true and the raw 
moments. The difference is due to the fact that in this instance the assump- 
tions permitting the use of the EuIer-MacLaurin theorem in abbreviated form 
are not justified for this curve. But there is no way of telling at the outset, if 
one has given initially only the partial areas, whether precisely this curve or 
another which to the eye would appear very much like it is truly the curve which 
will graduate the same material when subjected to a finer classification. 



THE POINT BINOMIAL AND PROBABILITY PAPER 
By Frank H. Byron ^ 

1 . An approximation to the sum of a number of consecutive terms of the point 
binomial may be found graphically and quite expeditiously by means of so- 
called “probability paper.” This paper is ruled so that the (a:, y) graph of the 
equation of the integral of the normal curve 

1 r® -S- 

y = —7= I e ^ dx ( 1 ) 

V 2t J-» 

is a straight line. Let the successive terms of the point binomial be represented 
as follows; 

(p "b 9 )" = Wo "b '*^1 "b ■ ■ * "b “b ■ ■ ■ "b > (2) 

where Ut = nCtP’‘~‘q‘ and p ^ 9. Then the (a:, y) graph of the equation, 

1 

y = « 4- ^ = x, ( 3 ) 

i.e., of the sum of first {t + 1) terms of this point binomial, is, in all but extreme 
cases, a set of points lying on a gently turning curve, so gently that its form may 
be represented closely by two straight lines, each passing through the median 
point as will be explained in the next section. As paper of this sort is readily 
obtainable, and as this method yields as great accuracy as is really useful in 
many problems, it is suggested that its use ought to be quite general. 

2 . Sheppard’s Corrections. The formulae for the moments of the point 
binomial, mean = qn, = pqn, are exact without any corrections such as are 
used for grouped material. This fact has led us all (apparently) to assume that 
in fitting the curve to the point binomial one would get a better fit by equating 
the moments of the curve to the uncorrected moments of the point binomial 
rather than to the corrected moments. The studies made in connection with 
the preparation of this paper show that when the purpose is to equate areas to 
sums of terms the corrected moments should be used. The theoretical basis 
for this conclusion is as follows; 

To simplify the argument let us suppose that one were seeking that curve of 
Charlier type, 

F(x) = co^o(x) -b Cj<^i(x) 4 • • • Ciin{x) , (4) 

^ With the assistance of Burton H. Camp. 
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(where (j>o is the normal curve and <^i, h, ■■■ its successive derivatives) whose 
integral would best fit the graph of (3). Since fitting is required only at the 
isolated points x = I U, 2h ■ ■ ■ , it is clear that one might obtain this by the 
two following steps. First let /(x) be any function whose integral meets exactly 
the requirement at these isolated points. What values this integral has at other 
points does not for the moment concern us. There are an infinite number of 
such f{x) curves. Next let the c’s of (4) be so chosen that F(x) will fit f(x) as 
nearly as possible. The ordinary derivation of the c’s supposes that the fit 
between /(x) and F(,x) is to be made by least squares, the residuals being weighted 
by the factor No matter what /(x) is chosen, the c’s can be deter- 

mined so that the weighted integral of (/(x) - F{x)y will be a minimum, but the 
value of this minimum will vary from one f{x) to another. We now desire to 
select that f(x) which will make this minimum value as small as possible, and 
it is reasonable to suppose that our best selection will be some f(x) which is as 
kindred to the nature of F(x) as possible. We shall not therefore choose an 
f(x) which oscillates wildly between the points where perfect fitting is required, 
(Fig. 1) nor yet an f(x) which is made up of the top bases of the point binomial 




histogram; we shall prefer a modification (Fig, 2) of that histogram by a smooth- 
ing process. Such an/(x) will not have the exact moments of the point binomial, 
but, more nearly, those moments corrected for grouping. Then the determina- 
tion of the c’s will come out in terms of these corrected moments, not in terms of 
the uncorrected moments. (In fact the uncorrected moments would he the 
exact moments of mf{x) having an oscillatory character between the important 
points.) 

Of course, when n is large, the difference is too small to be noticed and the use 
of Sheppard’s corrections is not worth while, and since n usually is large when 
approximations of this sort are needed, the point is not usually important. It 
was important in the computation of the tables of §4. Moreover, the use of 
Sheppard’s corrections does not invariably yield better results, the gain being 
masked sometimes by other effects to be considered in §3. An excellent illus- 
tration of uniformly better results is in fitting (^ -|- §)» by a curve of Type 4. 
The errors in the sums as derived from (4) with and without the corrections, is 
given on the following page. 
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t 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

With 

Corrections 

0002 

0001 

- 0003 

|- 0001 


0001 





Without 

Corrections 

.0007 

0022 

0039 

0036 

I 

0000 

i 


- .0039 

- 0022 

- 0007 

- 0001 


3. The stubby End. The other effects which mask this improvement are 

especially noticeable at the stubby end of a point binomial. We have to keep 
in mind here that the approximatmg curve (such as (4)), is required to turn a 
sharp corner, for, due to the least square method of fitting, it is just as important 
that it be close to zero when t is negative, as it is that it be close to Wq, Ui, ■ • • 
when t is positive. Therefore, in order to turn this corner it has to dip below the 
a;-axis in the neighborhood of i This makqs the approximating curve too 

low just to the right oi t — — i, unless the whole curve be arbitrarily widened. 
This arbitrary widening is customanly performed by not using Sheppard's 
correction for a, and the result is a betterment of the fit at these points but a 
corresponding loss over the rest of the infinite interval. A good example® is 
(I + The fit is worse at the left end when Sheppard’s corrections are used 
but better over the rest of the interval. 

The same difficulty arises in another connection. If we compare the closeness 
of fit to a point binomial made by F{x) as written in (4) and by F{x) as it would 
be written if Ci were zero, it often happens (as is well known) that the latter is 
actually slightly better on the average. How can this be true if the c’s are 
chosen by the method of least squares and the best choice as thus indicated 
makes Ci different from zero? The answer is that the c’s are chosen so that the 
fit is best over the infinite interval, not merely over the interval from < = — § 
to i = n + and that furthermore the distant points are weighted more heavily 
than those near the center. Thus it might happen that a choice, other than the 
least square choice, and one in which C 4 would be zero, might be better for the 
restricted interval covered by the point binomial. This does happen especially 
when due to the abruptness of the stubby end of a very skew binomial, the 
curve has to dip below the axis in order to get by a sharp corner. A good ex- 
ample is the problem considered by Pry:® (,% -j- All the effects men- 

tioned are present here. The fit is on the average a little worse if C 4 is not equal 
to zero over the point binomial interval, a little better over the infinite interval. 

4. For graphical purposes a sufficiently good approximation to the median of 
(30 + S')”, is given by 

M — nq — {tp — q)/6. 


2 The true values are given on page 220 of Mathematical Part of Elementary Statistics, 
by Camp, D. C. Heath and Company, 1931 

’T. C. Pry, Probability and its Engineering Uses, p. 258, Van Nostrand, 1928. 
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The following tables enable us to find the first quartile Qi, and the ninth decile 
Da. The accuracy to which they can be plotted is only about one-tenth that to 
which they are given here. Therefore accurate interpolation is seldom neces- 
sary, The values of St+i are to be read from the graph at the points f -f i, as 
indicated in the directions preceding the tables. The graphical method will be 
found efiicient if one uses common sense in the computation. Numbers which 
are to be plotted should not be computed to a higher degree of accuracy than 
can be used graphically. In reading the values of S t+\ it is well to remember 
that the true values lie on a curve, and that outside the interval from to D 9 , 
they are slightly less than those given by the straight line. Once the graph has 



Fig. 3 

been made, all the values of can be read quickly; it is not necessary to make 
a separate computation for each t. This method is therefore specially advan- 
tageous when one wishes to find several sums of this sort for the same point 
binomial. It should also be noticed that one can tell from the appearance of 
the graph about how far the true sum would be from the two straight lines and 
so estimate the error to which his reading is liable. 

5. Illustration. Find the sum of the first 7 terms of (f -|- 
^Here < = 6 , Af = 8.278, Qi = 6.726, A = 11.369. The graph shows that 

E “ 0.224. The true value is 0.222. So the error is 0,002, 
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An idea of the accuracy of the method is given by the errors (out of two places) 
that would be obtained for this point binomial for various values of I, as follows : 


t 

2 

4 

6 

8 

10 

12 

14 

16 

Errors 

00 

.01 

.00 

.00 

00 

00 

.00 

00 


Directions for Use of the Tables: Let p = q, M = nq — {p — q)/5, 
Qi = + qn, Dg = X2 + qn. On the graph draw the lines MQi and MA. 

Read Si+i at < + f . 


Values of xi 


A 

2000 

1000 

750 

500 

400 

300 

200 


100 

75 

50 

25 

.99 

- 693 

-.701 

- 704 

-.710 

- 714 

- 720 

-.728 

— 

.747 

-.756 

-.771 

- 804 

98 

- 688 

- 693 

-.696 

- 700 

-.703 

-.707 

- 714 

— 

728 

- 735 

- 746 

- 770 

.97 

-.685 

-.690 

- 692 

-.696 

-.698 

- 701 

-.707 

— 

718 

- 724 

- 734 

- 784 

.96 

- 684 

- 687 

- 689 

- 693 

- 695 

- 697 

-.702 

— 

.712 

- 718 

- 726 

-.744 

.95 

- 683 

- 686 

- 688 

- 691 

- 692 

- 695 

- 699 


708 

-.713 

- 721 

-.737 

94 

-.682 

- 685 

-.686 

-.689 

- 691 

- 693 

- 697 

— 

706 

-.709 

-.717 

- 732 

.93 

-.681 

- 684 

- 686 

- 688 

-.689 

-.691 

- 695 



703 

-.707 

- 713 

-.727 

.92 

- 681 

-.683 

-.685 

- 687 

- 688 

-.690 

-.693 


.701 

-.704 

-.710 

-.723 

.91 

- 680 

- 683 

- 684 

-.686 

-.687 

- 689 

- 692 

— 

699 

-.702 

- 708 

- 720 

.90 

-.680 

-.682 

- 683 

-.686 

- 686 

- 688 

-.690 



.697 

- 700 

-.704 

-.717 

88 

-.679 

-.681 

-.682 

-.684 

-.686 

-.686 

-.689 

- 

.695 

-.697 

-.702 

-.713 

.85 

- 679 

- 680 

- 681 

- 682 

- 683 

- 685 

- 687 

— 

691 

-.694 

-.698 

- 707 

.80 

-.677 

-.679 

-.879 

-.681 

- 681 

- 682 

- 684 


.688 

-.690 

-.693 

-.700 

.76 

- 677 

- 678 

- 678 

- 679 

-.680 

- 681 

- 682 

_ 

.685 

- 686 

-.689 

-.694 

70 

- 676 

- 677 

-.677 

- 678 

-.679 

- 679 

- 680 

— 

682 

-.683 

- 685 

-.690 

.66 

-.676 

-.676 

- 677 

- 677 

-.677 

- 678 

- 678 



.680 

-.681 

-.682 

- 686 

.60 

-.676 

- 676 

-.676 

-.676 

- 676 

- 677 

-.677 

— 

678 

-.679 

-.680 

-.682 

.60 

-.675 

- 675 

- 676 

-.676 

-.676 

-.675 

- 675 

— 

.676 

-.675 

-.676 

-.675 


ERRATA 

The Annals of Mathematical Statistics 
Volume VI, No. 1, March, 19SB 

On page 25, in Directions for Use of the Tables, p = q should read p ^ q, Qi xi + qn 
should read Qi =■ Xiir 4- qn, Dt Xi qn should read Dt •= ajjo- -f- qn. In the tables of 
values of X under p “ .97, n •» 25, instead of —.784 the number should be —.764. 
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Values of 


\ 

2000 

1000 

750 

600 

400 

300 

200 

100 

75 

50 

25 

.99 

1.307 

1. 

318 

1,325 

1 336 

1.344 

1 366 1 

378 1 

,439 1 

481 

See Auxiliary 

,98 

299 


307 

311 

318 

323 

330 

343 

376 

396 

Tables 

.97 

295 


301 

304 

310 

314 

319 

329 

353 

367 



,96 

293 


298 

301 

306 

309 

313 

321 

341 

352 



95 

292 


296 

299 

303 

305 

309 

316 

332 

342 



M\ 

291 


295 

297 

300 

303 

306 

312 

327 

335 



.93 

290 


293 

295 

298 

301 

304 

309 

322 

329 

1.342 

1 374 

.92i 

289 


292 

294 

297 

299 

302 

307 

318 

325 

336 

365 

.91 

289 


292 

293 

296 

298 

300 

305 

315 

321 

331 

357 

.90 

288 


291 

292 

295 

296 

299 

303 

313 

318 

325 

361 

88 

287 


290 

291 

293 

296 

297 

300 

309 

313 

321 

341 

.85 

286 


288 

289 

291 

292 

294 

297 

304 

308 

314 

330 

.80 

285 


287 

288 

289 

290 

291 

293 

298 

301 

306 

317 

,75 

284 


286 

286 

287 

288 

289 

291 

294 

297 

300 

308 

70 

284 


286 

285 

286 

286 

287 

288 

291 

293 

296 

301 

65 

283 


284 

284 

286 

285 

286 

286 

288 

290 

292 

296 

.60 

283 


283 

283 

284 

284 

284 

285 

286 

287 

288 

291 

.50 

1 282 


282 

282 

282 

282 

282 

282 

282 

282 

282 

282 







Auxiliary Table 







P 



60 

60 

40 

36 

30 

25 


20 




.99 

1 

.525 

1 576 

1.663 

1 740 

1.871 

2 149 

3, 

209 




.98 


416 

435 

456 

488 

620 

1.568 

1 

652 




.97 


381 

394 

413 

433 

445 

472 


514 




.96 


362 

372 

387 

397 

410 

428 


457 




.95 


360 

359 

370 

378 

389 

405 


425 




.94 


336 

349 

369 

366 

375 

387 


406 




INEQUALITIES AMONG AVERAGES 


By NiiiAN Noreis 

Numerous inequalities among averages of various types are condensed in the 
monotonic character of the function 

of the positive numbers xi, Xi, • • • , x„, not all equal each to each. For t = — 1 
this function is the harmonic mean; for i = 0 it is the geometric mean; for f = 1 
the arithmetic mean; and for i = 2 the root mean square. The relations 
among these four means which customarily are proved by special and dis- 
connected methods appear easily as applications of the theorem that 4>{t) is 
an increasing function of t. That is, for any values of h and U such that — 

< <1 < is < , it will be true that Several proofs of this theo- 

rem have been published, many of them very complex, An extremely simple 
proof is herewith presented. i 

That and all exist and are continuous for all real values of t 

may be shown by expanding each of the quantities xl in a series of powers of t 
and considering the remainders after each of the first three terms. The ordinary 
rule for evaluating forms reducing to 0/0, which requires the function under 
consideration to be continuous and to have at least a continuous first derivative 
for t = 0, may then be applied to [log <f>it)]/t to show that <j>{0) is the geometric 
mean. It is clear that <!>{—<») and <t>{+<^) are respectively the least and the 
greatest of the Zi. This fact and the monotonic property of make it evident 
that for each real value of t, the function may be regarded as an average in the 
usual sense that it lies within the range of the observations. 

For a simple demonstration of the increasing character of consider the 
auxiliary function 


F(t) = f 


fpit) 


,,d jl. 2a:' 


2x' log x , 2a:‘ 

-57^ -'“SV 


It is clear that has the same sign as Fit). The theorem will be proved by 
showing that the sign of Fit) is positive for all values of t except zero, when 
vanishes. 


‘ Professor Harold Hotelling rendered invaluable assistance in condensing for publica- 
tion the material herein presented from a more extended study of generalized mean value 
functions, 
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Differentiating the last expression with respect to t, one obtains upon sim- 
plification 

== [(SxO (S log® s) - (2a:‘ log a:)®] . 

(S X 

By Cauchy’s inequality (known as Schwarz’ inequality when applied to integrals 
instead of sums); the expression in square brackets is positive. Hence F'(t) 
has the same sign as i. Consequently F{t), since it diminishes for negative 
values of t and increases for positive values, has a minimum for t = 0. But by 
direct substitution, F(0) = 0. It follows that F{i) and are positive for all 
values of t other than zero. Therefore .^(i) is an increasing function. 

By direct general methods it is possible to show that 

^'{Q) = (n*)" ^ [nS(log a;)® — (2 log a;)®] . 

This expression obviously vanishes only when n2(log a:)® == (2 log a;)®, a condition 
which is satisfied only in the trivial case when *1 = Xj = • ■ • = i„. 

A proof exactly parallel to that given above may be applied to integrals or, 
more generally, to Stieltjes integrals. The monotonic increasing character of 

j appears in this way if one assumes that ^(x) is a non-decreasing 

'unction integrable in the Riemann-Stieltjes sense, such that 00 ) — =* Ij 

and such that / x‘#(x) exists for every real value of t. In terms of statistical 

Jx-O 

theory, this consideration extends the theorem from samples to populations of a 
very general character. 

Proof of the increasing character of ^(t) has also been derived from Hdlder’s 
inequality, the demonstration being expressed in terms of Stieltjes integrals.® 
The simplest general proof of the monotonic attribute of heretofore published 
appears to be that of Paul L4vy.® As early as 1840 Biepayrnd* presented a 
generalized form of <^(i), namely. 


/cioT 4- . 

• • + Cna:V 

V Cl -f- ca -f*. • • 

■+Cn ) 


and announced, without proof, its increasing character. In 1858 a proof of the 
monotonic quality of <t>(t) for special cases was published by Schldmilch.® Of 


• J. Shohat, “Stieltjes Integrals in Mathematical Statistics," Annals of Malhetnatieal 
Statistics (American Statistical Association, Aim Arbor, 1930), Vol. 1, No. 1, p. 84. 

• Calcul des Probabilitis (Gauthier- Villars et Cie., Paris, 1925), pp. 167 /. 

• Jules Bienaymfi, SociiU Philomatigue de Paris, Extraits des Proems- Verbaux des Seances 
Pedant L’Ante 1840 (Imprimerie D’A, Renfe et Cie., Paris, 1841), Seance du 13 juin 1840, 
p. 68. 

• 0. Schldmilch, “Ueber Mittelgrdssen verachiedener Ordnungen," Zeiisehrift /tlr Jlfolfce- 
matik und Physik (B. G. Teubner, Leiprig, 1858), Vol. 3, pp. 303 f. 
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the more recent general proofs of the increasing character of <^(<) which have 
appeared, those of Jensen,® P61ya,^ Jessen,® and Carath^odory® may be men- 
tioned. A recent application of ^(0 to index number theory is that of Professor 
John B. Canning.’® 

Vassab College. 


® J. L. W. V Jensen, “Sur Les Fonotiona Convexes Et Les Inegalit6s Entre Les Valeurs 
Moyennea,” Acta Maihematica (Beijers Bokforlagsaktielbolag, Stockholm. 1005), Vol. 30, 
pp. 183-186 

’’ G PClya and G. Szego, Aufgahen und Lehrsalze Aus Der Analysis (Julius Springer, 
Berlin, 1925), Vol I, pp. 54 /. and 210. 

® Bprge Jessen, “Bemaerknmger om koveskse Funktioner og Uhgheder imellem Middel- 
vaerdier,” Matematisk Tidssknfi (Charles Johansens Bogtrykkeri, Copenhagen, 1931), 
No. 2, 1931, pp. 26-28. 

• Attributed to Prolessor Constantin Carntb4odory in an unpublished manuseript o£ 
Professor Harold Hotelling 

“A Theorem Concerning a Certain Family of Averages of a Certain Type of Frequency 
Distribution,” a paper presented before a joint meeting of the American Statistical Asso- 
ciation and the Econometric Society at Berkeley, California, June 22, 1934. 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS OF SAM- 
PLES DRAWN FROM A SET OF INFINITE POPULATIONS 

By Hyman M. Feldman^ 

Introduction 

In the second part of his investigations, "On the Mathematical Expectation of 
Moments of Frequency Distributions,”® Tchouproff presented a method which 
may be interpreted as sampling from a set of infinite univariate populations. 
In the present paper this method is extended to the study of moments of product 
moments of samples drawn from a set of infinite bivariate populations. It is 
also shown how this method may be extended to populations of higher order by 
deriving some of the simpler formulae for populations of three and four variables, 

Tchouproff ’s method has been criticised® because of the complicated algebra. 
On close examination it is found, however, that it is not the algebra which is 
complicated but rather the symbolism. Tchouproff introduced a great variety 
of symbols both in his derivations and in his results. As a consequence his work 
seems very intricate. If, however, the number of symbols is reduced, and the 
symbols themselves are simplified, which can be easily accomplished, the under- 
lying idea of Tchouproff’s method is found to be very simple. 

Quite a complete study of product moments of any bivariate population has 
been made by Joseph Pepper in his "Studies in the Theory of Sampling.”* His 
method is essentially an extension of Church's® method, in his studies of univa- 
riate populations, to bivariate populations. He does not, however, derive any 
generalized formulae. In the present study generalized formulae for both the 
first moment and the variance of product moments of any order are obtained. 

It may be noted here, that all of Pepper's formulae for any infinite population 
can be obtained from those of the present study as special cases, by assuming 
that all the populations in the set are identical. 


' A dissertation presented to the Board of Graduate Studies of Washington University in 
partial fulfilment of the requirements for the degree of Doctor of Philosophy, June 1933. 

» Biometrika, Vol. XXI, Pec. 1929, pp 231-268. 

* Church, A. E. R, “On the Means and Squared Standard Deviations of Small Samples 
from any Population,” Biometrika, Vol. XVIII, Nov., 1926, pp, 321-394. 

* Biomelnka, Vol. XXI, Deo. 1929, pp, 231-258. 

‘ Church, A. E, R., “On the Means and Squared Standard Deviations of Small Samples 
rom any Population,” Biometrika, Vol. XVIII, Nov., 1926, pp. 321-394. 
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Chapteh I, Notations and Definitions 

Let (Xi, Yi), {Xi, Y2), ■ ■ ■ (Xn, Y„) be n bivariate populations each following 
any law of distribution whatever. The product naoment of order a in Z and b 
in F of the population will be denoted by P* b- It is defined as 

P^ = a^r (n - (1-11) 

where Ok = B(Xk), ik = B(Yji), (1-12) 

and where the symbol E signifies the expected value or the mathematical expec- 
tation of a quantity. 

Regarding each of the n populations of the set as infinite,® samples of n are 
drawn, each member of a sample from one of the n populations.'^ The individual 
which is drawn from the population will be denoted by (xk, vi)‘, and the 
product moment of order a m a; and h in y, of such a sample will be denoted by 


Pah. This product moment may then be defined as 

Pah ~ S (xk — xY {Vk — yY (1.13) 

where x = n~^ Sxk, y = n~'- Syu . (1‘14) 

The symbols a and b will now be defined by the equations 

a = n~^ Sak, h = n~^Sbk. (1.15) 

Obviously, E(x) = E{n'~'^ Sxk) = n~^ SE(Xk) — n~^ Sai == a. (L16) 


Similarly E{y) = b. That is, the mathematical expectation of the mean, of 
such a sample as was described above, is equal to the average of the means of all 
the populations.® 

In order to make the equations as compact as possible the following additional 
symbols will be employed ; 

Xk — CLk = Uk, X ~ a = u, 

Vk ~ bk = Vk, y - b = V, 

also Ok — a = Ak, bi ~ b = Bk. 

From the above definitions it easily follows that 

E{uk) = E{Vk) = E{Uk) = E{Vk) = E(u) = E{v) = 0 . (1,18) 


and Uk ~ u = Uk 
and Vk — V ~ Vk 


(1.17) 


* The term infinite is used here in the probability sense. It is defined very clearly by 
Qhurch in his “Means and Squared Standard Deviations of Small Samples,” Biometrika, 
Vol. XVIII, Nov,, 1926, p. 322. 

' It may be easily shown that this is equivalent to drawing a sample of n from a set of 
any finite number of populations. The number drawn from each population, however, 
must be specified. See Biometrika, Vol. XIII, 1920-21, p. 296, footnote. 

" This, of course, is a result of the Lexis Theory, for Poisson and Lexis Series. 
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The notation is now completed with the definition of the symbol Q{f by the 
equation; 

Q,v = SittH - aY {h - b)’ = SAtBi . (1.19) 

Chapter II. The Mathematical Expectation of p«i, 

The mathematical expectation of p<,b will be denoted by pat. In the terminol- 
ogy of moments this would be called the mean or first moment of the distribution 
of Pab. 


1. The Mathematical Expectation of pu. According to the above notation 
the expected value of pu is pn- By definition 

pH = E(pii) = En~^S(x{ - x)(yi - y), (2.11) 

and obviously (a;, — x)(yt~ y) = n~^SE{xi ~ x)(yy — y). 

Writing 

Xi ~ X = [(*< — a() — {x — a)] + [a, — a] == t/< -f- 

Vi-y => [(l/.- - bi) - (y - b)] -f [bi - h] = + Bi, 

equation (2.11) may be written as 

pu = n-^SEiU, + A,)(Vi H- BO 

« n-^SE(U,V0 + n~^SA,EiV0 + n-^SBiE{U0 + n~^SEiAiB0. 

Since for any given population and B{ are constants, it follows that 
E{A,B0 = AiBt. Hence 

n~^SB{AB0 = n~^SAiBi = 

Making use of (1.18), it is seen that the terms n~^SAiE{V0 and n-'-SBiE{Ui) 
are zero. The only term left to evaluate is therefore n~^SE(UiV0- Since Vi 
and V t are symmetric functions of the corresponding small letters, their product 
is symmetric in mvi. There is therefore no loss in generality if attention ip 
concentrated on a single subscript, say 1. '' 

We may therefore write 

^n-^SEiUiVi) = n-iS(f7iF0 + n-^SE{UiV0. 

Remembering that Vt — u, — ussUi — n~^3Ui, we may write, 

V \ ^ U\ w = Ut "" 71 ^(ui "1- Ui “b ’ ‘ ' d" Hft) 

= n ‘[WiWi — (wi + Wj + • • • + -j- Ui^i -f- • • • -j- Mb)] 
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where ni = n — 1. In general, n, will denote the number n — i. Similarly 

Yt = n~^\niVi — + * • • r»— i i’tH-i ■}■•••+ ^n)] • 


Thus 

n-^SE(UiVt) ~ n-^E(niUi — «2 — • • • — u„){niVi — — v„) 

4- n-^SE{niU{ — mi — ■ ■ • “ w,-_i — e.+i — • • ■ Wn) 
a 

(WlW, — Vi — • . • — V,'_i — P,+i — • • • v„) . 

When the right hand side of the last equation is expanded the only terms which 
appear are of the form E(UiVi) and E(utV,). The last one must vanish for u, 
and Vj are independent and hence E(uiVj) = E(ut)E{vj) = 0. From the last 
equation above it is easily seen that the coefficient of E(uiV]) is 

n~^(nl+ ni) = n~^ ni(ni + 1) = n~^ rii; 

and because of the s 3 Tnmetry this is obviously the coefficient of any term of 
that form. Hence 

n~^SEiU tV t) == n~^iSE(u-,Vt) . 


Since Ut = Xi — a„ y, = j/, — 6„ then 

= E(x, ~ a.)(2/. - 6*) = £J(Z. - - 6.) = PJi 

and in general, 

PCajfcPfc) = j . (2.12) 

We thus get the formula 

fn = n-'^niSPii + rr^Qn . (1) 

Now suppose all the n populations are identical. Then all the A's and also 
all the B’s vanish and therefore, Qn = 0. The formula (1) thus becomes 

« pn = ” ~ ^ Pii . (1') 

This is exactly Pepper’s formula for pn for an infinite population.® 

2. The Mathematical Expectation of pn. By definition 

pn = En-^S{x^ - xY{yi - y) . ( 2 . 21 ) 

• Biometrika, Vol. XXI, p. 233, Eq. A, N = <» . As was already stated in the introduc- 
tion, all the formulae of the present study reduce to Pepper's when the above assumption 
is made. 
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Pi'oceeding as above it is seen that 
En-'^S{xi - xy(y, ~ y) = nr^SEixt - xYiVi - v) 

= n-'^8E{V, + A,)KVi + Bi) = n~^SEiU\V.) + 2n-’-SE(U ,V iA,) 

4 - n-^SEiU^iB,)+n~^SE(ViA]) + 2n-^SE{AiBiUi) + n~^SE(A^,Bi) ■ ■ • .( 2 . 22 ) 

It is quite evident that the two terms before the last vanish. To evaluate the 
remaining terms, we employ the reasoning of section 1 of this chapter and write: 

SE{V]Vi) = E(U\V,) 4- SE(U^iVi) 

2 

= n~^E{n[ui — Ui— • • • )iniVi — Vi— • • ■) 4- n~*SE(niUi ~ ui ~ • • •) 

3 

(UiVi - Vi - 

Since terms of the form E(U{V/) vanish, only the coefficient of the term E{u]»i) 
must be found. Again considering the subscript 1, the coefficient of E{ulvi) is 
easily found from the last equation to be 

n~^(nl — nj) = n~’ni(ni -f l)(ni ~ 1) = nrHiUi . 

Thus 

Vi) = n-^,m8B(u^i e.) = n-^in*SPja . (2.23) 

For the second term of (2.22) we have 

SEiV,VAi) = EiUiViAi) + SEiUiViA,) 

2 

= _ U2 - ••• )(niyi - Vj - . . . )Ai -f n-^SE{niUi - Ui - •••) 

{uiVi — vt — • . • )A(. 

The coefficient of E{uiVi) in the first term of the right hand side of the last 


equation is n“% ’ Ai. In the second term it is n~®SA , = — 

n~^Ai, since SA ,• •— 0. 

2 

It therefore follows that 


2n-^SE(U,ViAi) = 2n~^n2SPi^Ai . 

(2.24),; 

Quite similarly 


n-i<SA;(C/=<Bi) = n~^n,SPii>Bi, 

(2.26) 

and it is obvious that 

' 

n-^SE{A]B,) = n-iQ*,. 

(2.26) 


Note that the i* which has the coefficient m does not occur among the u’s which hay? 
the negative sign. 
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We thus get the formula 

P 21 — n~^nin2SPli + n~^n2S{2P\iA^ -h P^qB,) n~^Qn. (2) 

3. The Mathematical Expectation of pn and pn . 

pn - En~^S{xt — xY{y^ — y) = n-^SE{x^ — xy{y, — y) 

= n-^SE(U, + + B,) = n-^S{E(U^iV, + U\B, + 

+ W\AiB, + 3f7,7.^? + ZV,A\B, + 7,^1 + . (2.31) 

The two terms before the last are zero. The last term is 

n-^SE{A\B,) = n-’Qai. (2.32) 

By (2.23) and (2.24) and some slight manipulation 

3 n-^SEiUtA,B, + U,V,A\) 

= Zn-^n,S(PioA,B, + + in-^{QnSPlo + QloSPU), (2-33) 

and by (2.22) 

n-^SE(UlB, + 3f/!7,A) = »-^(«? + DSiPl.B, + SPUA,). (2.34) 

The only new term which is to be evaluated is SE(Ul 7,). This may be 
written as follows: 

SEiUWi) = n-*8EiniUi ~ ui - ... )«(nii». - ri - • • • ). 

When the right hand side is expanded it is found that the only non-vanishing 
terms are of the form £'(f7^7,) and E(u1ujVj). Only two subscripts, therefore, 
have to be considered. Without any loss in generality these may be taken as 
1 and 2, and the right hand side of the last equation may then be written as 
follows : 


SE(niUi — Ui — • • • — Vi — • • . ) = E(niUi — Mi — • • • )®(wiVi ~ vt— • • ■ ) 

+ EiiiiUi — ui~ • • -YinxVx — V2 — • • •) + SE(niUt — Ui — Ut )* 

8 

(rail;, — Vi — Vi — ... ) . 

From this last expansion it is easily seen that the coefficient of E(u\ a,.) is (nj Hj) 
and that of E{u\ujV,), (6n\ + Zrii) = 3(2ni -f- nt). We thus finally obtain 

SE{U\V,) = n~*{(nj + nx)SE{u\vO + 3(2iii -1- nj)/S.fi(MjM,.«,)}. 

But by (2.12) Eiulv^) = PJ 1 , and since m< and u, and m, and v, are independent 
E{u\ujV,) = E{u\)E\ujV,) = PloPii. Whence 

EWlV,) = n-*{{ni + «i)SP‘;i + 3(2n? 4- na)SPjoPi 1 }• (2.35) 



36 


HYMAN M. FELDMAN 


From (2.31) and the succeeding equations we finally get 
Psi = d' 4" 3(2ni -}■ 20 ^* 11 } 

+ + l)^(PJo-B. + 3P5\^.)} + 3n-=>{(ri? - l)S(Pj„4.B. + PUA\) 

4- Q 11 SP 2 0 + Qio^P'i 1 ) + w~''Q3i . (3) 

The derivation of Vn is so similar to that of pai, that it would be mere repetition 
to go through the details again. We shall therefore merely write down the 
formula for pa which is 

put = n~^[(,Ti\ 4'W],)SPJj (,2n\ ‘i^SiPlaPot + 4P uPii)} 

+ 2n-*[{nl + DSiPixB, + P^A.)} + - D'SCPJoP? + iPlM 

d-PosAj) -{'Qio^Pli + Qo2'SP5o 4-4QhSPJi) (4) 


4. The Mathematical Expectation of the General Product Moment p^b. 
So far, formulae for the mathematical expectation of pab, for particular values 
of a and b, have been derived. The method used in deriving these is, however, 
perfectly general, and now, that it has been sufiBciently illustrated, it can be 
easily generalized. 

By definition we have 


pab = E[n~^S{xi - - p)^. 

Making use of the notation of Chapter I this may be written as 

np„t = £5((7.-4-A.)“(y. + B.)‘= ClClSEiUr^V\~^AWi) (2.41) 

q,r—0 1 

where 

fia _ nl fib h! 

* g!(a-g)!’ ' ~ rlQ) ~ r)V 

Expressing the U’a and F's in terms of the m’s and v’b and setting a — q = I, 
6 — r = m; we may write for a particular pair of values q and r: 

n‘+”SE{U\r:A\Bd = SEimUi - % )'(nir. - n )“A*P(. (2.42) 

Consider, now, the general term in the expansion of the right hand side of 
(2.42) . It is of the form : 


l\m\ 

na»!npA 


(_l)>+"(_7iO"-+'’*E(niM*l . ... , (2.43) 


where n«ikl =; adat! • • • a*! , 


* In this case, and also in the formulae that follow, whenever two or more indices 
appear in a summation, it will be understood that no two of them can have the same 
value simultaneously. 



MATHEMATICAL EXPECTATION OP PRODUCT MOMENTS 


37 


For particular sets of values ji, ji, • • ■ jk, on, at, • • • a*,, and ft, ft, • ■ • Pk, this 
term will appear in every member of the summation of the right hand side of 
(2.42), and its coefficient will differ only in the exponent of (— rii) and in the 
subscript i of Because of the symmetry there is no loss in generality if 

we take for ji, jt, • • • jk, the first k integers. We now break up the summation 
of the right hand side of (2. 42) as follows : 

SEiniUi — Ml — • • •)KntVi — Vi — • • 

1 

= E{niUi — U2 — -YiniVi — Vt — ■■ ■)”'A\B{ 

+ E{niUt — Ml — . . ■yi.niVt — vj — • • ■)'^AIBI + • • • + E{niUk — mi — • • ■)' 
— «i — . • -Y'AIBI + /S jBCmim. — Mj — • • •)' 

» =A;4-1 

(,11„,_„1 (2.44) 

From (2.44) we easily get for the total coefficient (excluding the numerical 
factor) the expression 

-S (-ni)“*+^MSBi: + S AlBl. 

Writing 

S AlBl « SAIBI - SAIBI = 0,, - SAWl, 

jt+i II 1 

the general term, (2.43), together with the total coefficient, may then be written 
as 

- J] AlBl + ulV/ . 

Since u, and u,-, Vi and v,-, and m< and Vj are independent while m, and v, are 
not, we have: 

L EUuk Vk = ILEuk Vk ~ nP*j(3„ 

II. Any term in which afc + ft = 1 must vanish. 

From II it follows that the maximum number of subscripts which can appear 
in any term in the expansion of (2.42), i.e. the upper limit of k, which will be 
denoted by t, cannot exceed (I -t- m) /2. In fact when Z -f- m is even, Z = (1 + ni) /2, 
while when Z + m is odd, t is the largest integer less than (Z + m)/2. 

Making use of (2,41), the equations following it, and the reasoning of the last 
paragraph, we finally get the formula: 
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The following restrictions on the a’s and /8'sinust be observed 

(a) ai + as + • • • + a* == O ~ ? 

(b) / 3 i 02 + • ■ • 4" 0j! = 6 — r 

(c) ah + Pk 1 • 

In case the n populations are identical ( 5 ) reduces as follows: For q = 0 , 
r = 0, ^2 = 1, B? = 1, and Qqo = n; while in every other case = 0, 

Q,, = 0 . The summations with respect to q and r, therefore disappear. 
Consider now the summations 


S S • ■ • fS Pu P'2 • • • P'* . 

)1— 1 i'a“l Jt=l 

Since all the populations are the same we may drop the j by actually carrying 
out the indicated summations. If, then, there are c repetitions among the k 
pairs of integers oihph, in which aA, atpt, • ■ • acPc, are repeated h, U, • ■ ■ h 
times respectively, then we have; 


S 

ii-i 


s 

; k~l 


pn 


k \0 


hlkl 




HP, 


ahffh 


We thus arrive at the following corollary; The mathematical expectation, 
pal, of the product moment, pai, in samples of n from a single infinite population 
having any law of distribution is given by 


n(— n)“'*'‘pa(i 


O Ttto — t ^ ^ 

not* • HPa ijfc-'-lLA-'l 
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Note'. In deriving these general formulae it was assumed that n > t. There 
is however, no loss in generality in this assumption. For, it t > n, we may 
suppose that, Xn+i = a:n+2 = - • • = = 0, and hence PS^^ = • ■ • = = 0 , 

and thus the above reasoning is still valid. 


5 . Formulae for pn, pn, p^, p^, ^33, Formulae for pai in which <z + & = 5 , 6, 
7 , 8 have been obtained. But for (a + 6) > 6 these formulae become very long, 
and since these will be of no use in the subsequent work, only those of order 5 
and 6 are given below. 

pii = {(nS - nOBPli + 2nnlS(2PUPli + SPiiPJo)) 

+ n-‘ ((nl + m)B(Pi„B. + API^A;) + 6nn,S(P^oB.PJo + 2 PlU.P^o)} 


* This is a generalization of Pepper's results for iV = 00 . gee Biometrika Vol. XXI, 
pp. 231-240. 

fThe symbol PiiAtPii, is an abbreviation of the full term (A, + d.,) (PjiPro + 
P 11 P 20 ). Similar abbreviations will be used in the other formulae. 
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+ {(n? + 1) Si2Pi,A.B, + - 2Qi,SPU - 3Q^•^SPl,\ 

+ 2n-> ((n? - l)Si2Pl^AlSPioAlBi) + 2Q,oSPl^ + SQnSPU] + n-^Qa. (6) 
= nr* {(ni — ni^SPh + nn^SiPloPoi "h 6 P 21 P 11 + SPaoPia)} 
n~'* {(wj — 1)»S(2P3 iBi + 3 P 2 j^,) + SwjisSCPJ qPJ iP,' 4" IP 20 P 02 
+ iPliP{i]A,)} + n-* Kn! + DSiPUBl + &Pl^A,B, 

+ 3PUA\) - Q^PU - QQuSPU - SQ^oSPU ! + n-* ((n*i - l)^(3Piol.P“ 
6Pii.il<Pt 4 “ Pq a-'l ») "l~ ^Qw/SPJ 0 4“ QQiiSPii 4" Q3o<SPo2}4“^ *^32 • (7) 

pti = ii~'’ {(w* 4" wi)^Ps 1 4" 4“ 4" Th^SiPl oPJ 1 

4-2P|iP|o) - 10(2n? -n2)-SP5iP|„ 4 - 30(3n? 4- n3)SPJ oPJoPj 1 } + 

4 - l)S(Pjo-S» 4" SPJiil,) 4 " 4” 1 )S*[ 2 P 5 oP 2 oPi 4" (^PJoPii 

4" 3Pj iPa — 10'W'W2S*[2P5 oPj 0 P 3 4“ ( 2 P 3 oPi 1 4" ^PaiPao)-^-}] j 
4“ Sw® {(wj — l)P(P4 o^,Pi 4 " 2P3 1-4 *) 4 " 6ww 2<S{P2 0P2 o- 4 iPi 4 " 2P2 oPi i -4 f ) 
4" Qh^CPIo 4" 6 P 20 P 20 ) 4" 2Q2o<S{P3i 4“ 6 P 20 P 11 )} 4~ ] 0 n~'*{(ni 
4 - l)SiPioA]B. + PiM - Q^iSPU - QsoSPJi} 4- 5 n-®KnJ - l)S(2Pj„4?B. 
4 - Pi lAO 4 - 2QnSPi , 4- Q^P'x X } 4- n-‘QK . (8) 

pii ~ w^{(w* 4" ^O^Pia 4" (^1 4“ 4'^)*^(^4 oPo 2 4" 8 P 3 iPJ 1 

4-6P,SPJo) 4-4(2nJ -n2)5(P3’oPi% + SPhPix) 4- 6(3n? 4' n3)<S{PJoP^QPS2 
4 - 4P; oPi^P? 1 ) } 4- 2n- { (nj 4- 1)<S(PJ iP. 4 - 2PJ 2 ^) + 2(ni* + 1)P[(2P5 oPi 1 

4- 3PjiP^)P.- 4- (PSoPSa 4- ePJiPii 4- SPiaPUM.] - 2nn2S[(2P;oP/, 
4-3PiiPio)Si4-(P5oPia4-6P;iPii4-3P;aPioU,]}4-n-®{(n}-l)P(PloP? 
4- SPii^P. 4 - QPhAl) 4 - 6nH2-8[PjoPiiB4 4- 4PjoPii^.P< 4- (P^oP^a 
4- 4P} iP/i)^!^] 4” QmS(Pl 0 4' 6 P 2 flPao) 4" 8QnP(P8 1 4* ^PJ oPi 1 ) 

4 - QP«S{PU 4- PioPht 4- 4PiiPli)} 4 - 4n-*{(n! + 1)P(P;„.A.B? 
4-3P,‘iA?P.4-Pi*A?) - QiaSP**o - 3Q2 iSP^i 4-Q8oSPi.} 

4- n-»{P(6Pja4jP? 4 - 8PUAIB, ^PiM+ Q 40 BPS 2 4- 8Q^,SPU 
+ (iQnSPio} +n-Hi4i. (9) 

pt» — n ’{ (wi 4 * wi)PP5 3 4" 3 (to* 4" 4“ ^)B(Pl iPpt 4" 3 P 2 2 Pi 1 

4-Pi3Pio) - (2 to5 - TO 2 )P(Pj„Po', 4 - 9 P 21 PI 2 ) 4-9(3to? 4-«3)-S(P5oPilP52 

* The repetition of this expression signifies that A and B factors are coupled only with 
those P factors which have corresponding indices. 



40 


HYMAN M, FEIPMAN 


4" 4P{iPiiPii)) + 3n“®{(n5 + -}- PI s-^i) + + 1) B[iPl oPit 

+ 6Pj iPi 1 + 3Pi,pio)B< + (P'osPio+ epi^pi 1 + 3^2 iPiM 

- nMPUPu + ^PhPii + ^puPio)B, + iPi^Pio +^Pi2Pii 
+ 3PjiP^2)i;]} +3«-MW - DSiPUB^i+iPi.AiB. + PiM 
+ 3n,n^8[PUPUBl + (PUPU + iPUPU)AiBi + PhPi^Al)] 

+ SlQoiiPU 4- 3PioPii) + 3Qu(P|2 + P5oPj2 H- iPUPU) + Q^iPli 
+ 3PUPU)]} +n~n{n\ + l)S{PUB\+mlA^B^^ +^Pi2AU, + Pi>A\) 

— SiQodPlo -)-^Qi2'^21 4"9Q2iPi 2 -hCsoPos)} 4" 3tt *{(^1 l)>S(Pj 0-4iB j 

+ 3Pl,AlB\ + Pl.AlBd + SiQi,PU + SQsjPii + ^siPJa)} 4- • (10) 

Chapter III. The Mathematical Eicpectation of the Variance of pah 

1. The Symbols mPai and silfp^a- Denoting the variance of pah by m and 
the mathematical expectation of by iMpa), , we have the definition, 

impo, = {n-’iS(x. ~ xYiVx - yY - 

= - xYipi - yY - 2n-^pat8(,Xi - xY{y^ ~ yY 4- flu and 

iMp^ = EiitriPcb) - E {rr^SKx, - xY{y, - yY - 2n-%bSix, - xYiVi ~yY + plb] 

~ n-'^E[S(,X{ - xY’‘{yi - y)^] + 2n~'‘E[S(x, - xY(xj - xY(y, — yYiVi - y)‘] 

- 2n-i^a!>P[5(a;. - xYiPi ~ l/)‘] 4- P«6 = n-^fiM 

4 - 2n-^E[S{xt - xYiVi ~ yYi^i - ^YiVi - l/)‘] - fib ■ (3-11) 

Before attempting to expand the right hand side of (3.11) for any values a, b 
we shall derive the formula for ^Mpu to illustrate the procedure. 

2. The Mathematical Expectation of impn. By (3.11) we have 

jMpii = rr%i 4 2ra-=P[S(a:. - x)iy, - y){x, - x)iy, - y)]~ fh. (3.21) 

The first term is given by (4) and the last by (1). The only new term is the 
middle one. To expand it let us write it in terras of U and V. We then have : 

n-m[{x, - x)(y, ~ y)ix, ~ x){y, - y)] = rr^SE[{U, + 

+ B,)(Ui 4 AMVj 4 B,)] = 4 4 C/,F,t7.B.) 

4 {UiViV,Af 4 U,V,ViA,) 4 (17.7.4,5,- 4 UiV,AiBi) 

4 {ViViAiBi 4 UfViAiB,) 4 U,UjBiBj-\- ViVjAxAt 4 4 vanishing terms 

4 4*fli4,B,]}. (3.22) 
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The evaluation of the last term is very simple. For 
SEiA,B^A,B,) = SiAiBiA,B,), 


and from the elementary theory of symmetric functions we have : 


S(A,B,A,B,) 


SKA,Bd ~ SUlB]) 
2 


Hence 


SE(A,B,A,B,) 


SKAfid - SiA\Bl) 
2 


Oil — Qi2 
2 


(3.23) 


To expand the first term and also the remaining ones, we return to the u, v, 
notation defined in Chapter I. We then write 

SEiUWiUjV,) = n-*SE[(nxUi ~ Ui ~ • • • )(%«. - t;i - •• . ) 

(niu, - ui - ... )(niv, - vi - •■•)]. 

The only terms which can appear in the expansion of the right hand side of the 
last equation have the following form: 

E(wJaJ), E(uiV^u,v,), 

i.e., exactly those which appear in the evaluation of pjj. Remembering the 
symmetry, there will be no loss in generality if we take for i and j the integers 
1 and 2. To find the coefficients of the three characterstic terms, the above 
summation may be broken up as follows: 

n^SE{U,ViU,V,) = E[{niUi — wj — • • •)iniVi — Vi — - • OC^iW? — mi — • • •) 

(niVg — Vi~ •••)]+ E{[niui — uj — • • OC^i^i — Dj — • • •) + (wiWa — «i 

- • • •)(«it '2 — Di — • • •)]<S(niw.- — Ml — • • •){niVi — vi— >••)}+ jSiS[(niM, 

3 3 

— Ml — ■ • •)(niv, — vi — ■ • •)(niUf — ui — ’)(niVj — Vi — •••)]■ (3.24) 


Writing the three terms in a row and their coefficients from the three parts of 
(3.24) in columns below these terms, we get the following scheme: 

E{u\v\) E(ulvl ulvl) E(uiViihih) 

n\ n\ {n\ + l)‘ 

+ 1) —2niivi 2n\ 


nirij rhUi 

2 ~Y' 


2ninz 


— nns 


Total nni(2ni — 1) 
coeff. 2 


2 


n{n\ + n\ — Snj -j- 3). 
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With the aid of the above equations we finally get: 






-^SP ‘2 2 


nws 


SPjpP’os 


+ n{nn\ — 3 n 2 )SPjiPJi 


Proceeding in the same way we find : 

SEiUJiUA + U,V,U.B,) = n~K2nl + n,)SPliB, 

SE(UJ,VA, + U,V,ViA.) = n~K2nl+n^)SPUA, 

SEiUiVM + U.ViAiBi) = -nthSPUA.B, + (nf + n,)Q,iSPli 

SE(U,V,AA + U,V,A.B,) = 2 nSP}i - Qu-SPJi 

SE(U,UABi + YSAA,) = n 8 {Pl<^\ + PUA\) ~ 

Collecting terms and simplifying we finally get: 

iMp,, = n~^{n\SPU + S{P{JPii + 2Pi yPU) - i)'} 

+ 2n-’n,j5(P5jB. + Pl24.)) +n~H8iPloBl +2PUA,B, + (H) 

Corollary 1 . In case X, = Fj, i,e., when the set of populations are univariate, 
( 11 ) becomes 

tMpic = - {P)o)^ + 4;SPJ„P^„j +in~%SPioA, + in~^SPioAl 

in') 

This is Tchouproft’s formula for the expected value of the variance of samples 
of n.“ 

Corollary 2 . In case the n populations are identical ( 11 ) becomes 

» n~%ln,P,, + P,,P,, ~n,Pl,]. ( 11 ' 0 » 


3 . The Mathematical Expectation of lAfpoj . We now return to the general 
equation 

lAfpaj = 4 - 2 n-* 8 E(x, - a:)*(y< ~ yY(,Xj - x)‘{yi ~ y)^. ( 3 . 11 ) 


*Sine« .B(«iP<) ■» Paj, B(uiV,) — PjoPoj, etc. 
“ See Bimtirika, Vol. XIII p. 2M. 

>> Bee Biomttrika, Vol. XXI p. 234, Cor. I. 
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The first two terms are given by (5). To evaluate the last term we write : 
SE[(x, - xy(y, - y^ix, - xy(y, - y)'’] = SE[(U, + A,y{V, + + A,) 


{V, + B,y] ^ SE{U’iV\hyV^) + 


s 


ri.rj.rs.r^-O 




b 

*‘4 


a 


SE{U‘:V’iU^,V‘A'yB'MyB’',*) = n-2<“+«5^{(niW. K?iiw. - • 

1 

{n,u,- YimOi )‘ + S n^n+r^'^+H) 8B[{mu, )“ 

rj.. r4 

{n,Vi yin,u, - . . -yiniv, YAyByAyBy, ( 3 . 3 p 

where « = a — ri, /3 = a — rj, y = b — rj, 5 = 6 — r«. 

The right hand side of (3.31) has been broken up into two parts because the 
first part is symmetrical, while the second part, in general, is not except when 
n = rs, and rs ~ U, 

Let us now consider the expression 

SE[{niU, - ... y(niv^ ~ ... )‘(niM, - . . • - •••)'’]• (3.32) 


This is a double summation in which c„ = c;, and in which the diagonal terms, 
c»j, are missing. 

Consider next a general term of k factors from the expansior of each bracket 
of (3.32). As we are dealing with s 3 Timietrie functions, there will be no loss in 
generality if we consider the first k sul:«cripts only; and if we let the lower limits 
of the exponents of the u's and v’a begin with zero we may consider that each 
parenthesis of a given bracket contributes exactly k factors. Such a term, 
omitting the coefficient, may be written as follows: 










vik) 




h-1 


= ri p*(«* + a*) iff/. + ^0 ■ (3.33) 

^-1 


This term occurs in every one of the ^nn-i brackets of (3.32), having the same 
numerical coefficient in every one of them, which is 


(a ly (b ly 

na* ! ! n/s, ! ! ‘ 


(3.34) 


To obtain the ?ii coefficient of (3.33) we break up (3.32) into the following partial 
summations: 

EKniUi — . . . )“(nji>, — . . . y(niu, — ... y(niv, — •..)»] = jB[(wi'iii — • . . 
(fiiui — ■ ■ ■ y (nitij — . .. y(niV2 _...)»] 4- ... 4- E[(niUir-t — • • • )“ 
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(niU)ir-l 


y {nm 


(jhiVi »S (niu, 

j’-fi+i 


y(niVk — 

• y(n,v, - ■ 

(nil), - 


. y] + 3 - 

>1+ S [E[in,Ui- 

• )‘(niu, — • ■ • yiniVi - • 


?)] 


From, this equation we get for the total coefficient in n of the term. (3.33) the 
following expression; 

S (_?n)“*+“l'+'’‘+'*l' + nk S + i-nyy^K] + CJ* . 

h.h'~i >‘-1 

The following restrictions on the a’s and fi’s must be observed. 


(a) 


Oil "f* ^2 "I" * • • Ofjfc «= fl 
+ * * • + ^ 


+ ^2 4- • • • 4- 
4" 4" ■ ' ■ 4- ^ 


(c) a* + a( 4 4 5^ 1 . 


From (c) we obtain the upper limit of k, namely: t = a -j- b. 
Combining the various above equations we finally obtain : 


(n)«<>+« S{U‘‘iV’lU‘‘,Vy = (a!)K6 0“ 


n 

s 


n,b 

s 


«A,0l^r,8A.(S ji-D 



s (-m) v+^' 4 wa 5 [(-ni)“‘+'’‘ 4 (-nOKy + CJ*) 

A,A'-i J 


n P(a\+..^) (8 a+ 8^) 
naj , ! nai ! I n/3( ! ‘ 


(3.35) 


Turning to the second part of (3.31) let us consider the expression 

s EKmu. — ) (nit). ) (ni«,- ) (mt),- ) AysyAyBy 

<-1,1-1 

for a given set of r’s. The term (3.33) may also be considered as a general 
term of this last expression; of course, the exponents of the m's and d’s will be 
different in this case. In order to evaluate the complete coefficient of a term 
like (3.33) we again write; 


iSJsKwiM, - . . -ycniv, - . . -yiniu, - . ■ .)'*(mt',- — )‘ AysyA^By 
= M(nini - ■ . ■ ycmvi - ... )y(niUi - ... yimvt ~ yAyByAysy] 
4 Mni-U, - ■ ■ . )«(ni»! - • • • )i'(niUi - - . • )?(ni»i - • . . 

4 ... 4 M[{niUk ~ ... )“(njt)* — . . . )‘v(niw*_i — . . . )'*(nit)fc_i — • • • )* 
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BlU] + SE[{n,u. Yin.v, ^A^B? S 

•'“■l 3-4 + 1 


(fiiUj — ■ • ■ yintVj 


yA]^BV] + s E[in,u, — yin,v, — y 


}=1 


niv, 


A]%^ S {niui nmv, )yAyBy]+ S EKmu. )“ 

4r3*“A, + l 

(3.36) 


. Hn^u, y(n,v, yA?B7AyBy] . 


It is now quite easy to write down the complete coefficient of a term of the 
form (3.33). The numerical coefficient of this term is the same in every bracket 
of (3.36), and is 


(~ l)3S„(a - n) ! (a - n ) ! (6 - n ) ! (b - u ) ! 
1 


n«4inar;!n^4!ni3( 


(3.37) 


The coefficient in Wi and Ay By Ay B'* is broken up by (3.36) into the fol- 
lowing four parts : 

k ' ' 

I. S i— n{)“ Ay ByAysy, from the first /fc(A: — 1) brackets. 




II. S S AyBy== S i-ni)“>‘^^’'AyBl 

A"*! 

from the next k(n — h) brackets. Similarly 

III. *S (— rei) r^rirg — S , from the next A:(n — Aj). 

ll'-l L A“1 J 

And finally: 

IV. S AyByA?By = 

i,l-k+l 11 1 

.s AysyAimy- s Aysy s Aysy- s Aysy s AyBy 

A-1 


A-1 


A'-l 


A-1 


A,A'«X 

+ 2 s Ay By s Aimy = Qrir5Qr2r4 Q(ri+r2) (rs+rp — 

h-l h'~l 1 

-QnnSAyBy-' s AyByAyByA-2 5 Aysy s AyByjromthe 


h,h’-l 


A-1 


A'-l 


last cj* brackets. 
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The restrictions on the a’s and |3’s differ from those given above in that a is 
replaced by o - n and a — n, and bhyb — ra and h — u; and from the restric- 
tion (c) we get for the upper limit of k, in this case, 

, a:-f|3-f-7-t-5 ,, n + H ra -\- n 

t, = = a + 6 2 

4 Sa 

when Sri is even, or the greatest integer less then ~ when Sr^ is odd. 

I 

Combining (3.37) with (7“^ • • • we get for the general numerical coefficient 
in the expansion of the second part of (3.31), the expression 

(-l),Sn(g!)Hh!)^ 

nn!n«jna;!n/9jn^(!‘ 

By an obvious manipulation we have 

+ Qnn £ [(-m) - 1] 

b-^lL J A“1 

S |^(— Til) * * ~ "h QnrsQmi ”* Q(rl+r2)(T3+rl) • (3.38) 

Finally, combining the various equations we get the formula : 
aMp^ = n-%.aa - pU + 2(n)-2(“+‘+» (o 1)^6 !)* S 




/S Al^Bl^ S 

h"i h* 


7 A “1 


a , b 

s 

/ t 


t 

S 

fc-i 




-f-n* g l(_n,)“'‘+3'‘ - 1 - (-nx)“‘^^*] -f C'S‘ 


k^l 




S (-ni) 

h.h'-l 

^ 2(n)-«°+*+» (o !)Kb 1)^^ S “s 

najmsiina^iwl! ^ ^ ' ,1-1 r,.r,.r,.r,-o Hn ! 








A !""1 




s I(-ni)«*+^A - 1] A{‘B;> S A'^'Bl< - is [(-/i, )«(+?; _ 1] 


h^l 


A -1 


S Al>Bl> + Qran 8 [(-ni)“*+(»* -1] 

)l-l A-l 
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4 - Qrm S - 1 ] ^**-Ba* + QrlraQriri — Q(’'}+ri) (rj+H) 


h-1 


HP''* 


{a+a^) 


llaj m I na* ! n/9i ! ‘ 


( 12 ) 


In case the n populations are identical the second part of (12) must vanish, 
and in the first part the summations 


S 

jk-i 


k 

n 




fc ! Ct n P (a*+a() (^A+/j() 

h!ls! ••• 1.! 


where h, It, k are the number of repetitions of the pairs of integers 
(ai + a() (di + fi'i), («t + «*) (/3* + ^'k), respectively. 

We then have the following 

Corollary: The mathematical expectation of the variance, smpaj, of the product 
moment, pab, in samples of n from a single infinite population is given by 


= P2a,b -Plb + 2(n)-««+*+>) (a !)*(6 !)* “’“s 


J, lb ! C"* 

*-i k\h\ ■■■V 


k 

s 

h,h'- 


. . a*+(J*+«. I+/5 

(-ni) ‘ 


*’ 4- n* S [(— 


+ (-ui) 


'PM 


+ ci 


XJ P(a/i 4" «a) ifik 4" Ph) 



na*!n/3A!na;!n^(! 


(120 


4. The Formula for Formula (12) can by no means be used mechan- 

ically. It does, however, summarize to a great extent the details in finding 
iMp^ for any given values a, b. Formulae for iMpu, jAfpji have been ob- 
tained, but the one for iMp^^ is too long to be included in the paper, especially 
since with a little work it can be easily derived by applying (12). The one for 
is given immediately below. 

- (PJ J*] + «j5[Pi oPJs + 4(PjoP^ - n»P^Ph)] 
-2nln,SPUPU + {nl +2)S{PloPioPU +SPloPliPii) + PSPUPinPU) 
+ 2n-^mnlS{Pi,B, + 2Pi,A, - P5„Pi,B. - 2PiiPiiA,) 

- imruSiPioBiPi, +PUAiPU) - 2n,S[n,PiiP.P|o 4-2(2nj - 3)PJi^PiM 

4 6nSPjiPh4; + 4n*S(P‘„P'\B, 4- PjoPS*^,- 4- PjoA.Pj* 4- 2P|iPioB, 

+ ^j2P5o^,)} +n-^{n?B[PioP? - (P2oP.)*] + 4<SPj«PJo(5. + B,)* 

+ 3(n^ + n,)BP^,4j -I- 4BP^ oPo%(^- 4- ^0* - 2ni<S[P,MPj,^J 
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+ 2P’ iPiiU. + 4,-)’] + l&8P[^A.PUAj - 4n^fil(P{ 

+ i{2nl -\-ni)SPiiAiBi - iniSPhAiBiPio - SniSPuPloAjB, 

+ SSiPi^BJ^i.A, + Pi^A^P’M - ^nlSPUAiPioBr 

— 2wiW2W~*S(QsoP 2 2 "l■2(^nP3l) •4” 2ii2W~*8f[6QiiP iiPJo 

+ QAPtoPi,+^PiiPii)]]+‘2rrHnnz8i'iPloA,B^i+2P\iAl+5Pl,A]B) 

- n,8[QMiB. + PisA) + 2Qu8Pi„P, + 2P\,A:)]] + jAt 

4 - iiPioAlBl 4 - P{,A\Bd] - 2nS[(Q^AiBi 4 - QnA\)P{y + QnPioAiB. 

4" QsoPoa-^?] 4" oPo j 4" ^Q2 o(QiiPi 1 4“ QiiPao)}) • (13)“ 

Chapter IV. The Mathematical Expectation of the Third Moment of pu 

1. The Mathematical Expectation of tmpn- Following the notation of the 
last chapter we shall denote the third moment of pn about its mean by smpu and 
the mathematical expectation of awipj, by aAfpu. We have then by definition. 

awipu = {n-'/S(a;, - a;)(y< - y) - #ii}’ , 

and by a well known formula we have: 

jAfpii == p? 1 — ZiMpiipii - Pi 1 . (4.11) 

The last two terms of (4.11) are given by (1) and (11). To evaluate pJi we 
write: 

Pii = E{n-^Sixi - x)(yi - y)}’> = rr^SEix, - 2 )’(y, - yY 

4- in~^SE{x, - x)KPt - y)K^i - x)(yi - y) 

4- 6n-‘jSE(x, - x)(p, - y){xj — x)(y, — y)(x* - x)(y* — y) ■ 

The first term is simply which is given by (10). The evaluation of the 
second term is not essentially different from the evaluation of the left hand side 
of (3.22), and since all details have been given there we shall omit them here. 
To evaluate the last expression let us write: 

SE(Xi - x)(p* - yXx, ~ xKvi ~ y)(^k - x){yk - y) 

= SE[{U, 4- A.)(y. 4- 5.)(C^, 4- A,){Vf 4- Bf){U, 4- 4*)(7* 4- B,)] 

= 8E{UiViU,V,V„Y0 4- SE(U,Y,U,V,UkBk) + ■■■ + SEU^AjBfAM • 

(4.12) 


In case the n populations are identical this reduces to one of Pepper's formulae, 
Biomelrika, Vol. XXI, p. 238, Cor. 1. 
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As there is a great deal of similarity among the various terms of the right hand 
side of (4.12), it will not be necessary to go into the details of the expansion of 
every one of them. We shall, therefore, indicate the details for the expansion 
of only two of them — one symmetrical and one non-symmetrical; and as the 
first two terms are of that ts^e we shall use these for the purpose of illustration. 
Using the w, v notation we have 

SE{UiV,U,V,UkVh) = n-«/SS[(niM. - ••• )(niu, - ... ) 

(niv, - ... )(niUk - ... )(niUk - ■••)]• 

The maximum number of subscripts appearing in any term evidently being 3, we 
can write without any loss in generality : 

(S.B[(niM, — • • • ) • • • (niVk — ...)]= EUnxU} — . . . )(niVi — . . . )(niU2 — ■ ■ 

mvi — ){niU3 — • • . ) (niva — ■••)] + M{(niUi — )(niVi — ■ • • )[(raiW2 - • ■ • ) 

(niVi — • • • ) + (niMs — • • • )(niz)3 — •••)] + (nnh — • . • )(niVt — . . • ) 

(niits - . • • )(niV3 - . ■ . ) iS(niU. - . . • )(nii;, _ . . . ) 4- B{(niUi - . . . ) 

4 

(niVi — • • • ) + — • • • )(niV 2 — • • • ) + niUa ~ . . . )(niV3 — • • • ) ) 

S(niUi - ■ . . - . . . )(niUj — • . . Xma, - . . . ) + SJSKniUt - 

4 4 

(mtifc (4.13) 

The coefficients of the various terms arising in this expansion can now be 
found quite easily. For example, the coeflicient of Fj gj which is, of course, the 
same as the coefficient of PJ 3 , is easily found to be 

+ n,(2nl + 1) + JLj) 4. _ nnaMSnx - 2) 

2 6 6 

To evaluate the summation SE(jUiVtUiV,UkBk) = n“®ASiP[(niUi — .••) 
(ttiDj — . . . ) {muj — . . . )(niv, — . . . )(%»* — . . . )Bt], we break it up into 
partial summations as follows: 

SE[(niUi — ... )(«ir, _ . . . )(«!«,• — . . . )(niv, ) (niUk )Bi,] 

= E{(niUi — ... )(ni«i — . . . )[(niU2 — . . . )(nit)a — ■ • • )(niU3 — • • • )B8 
+ (niUa — ... )B 2 (niU 3 — . . . )(ni«s — •••)] + • )Bt(niU 2 - . 

(njVi — . .. )(niUs — ■ • . )(niV3 — ■ • •)} 4- E{(niUi — ■ ■ ■ )(niVi ->...) 

[{uiUi — . . . ){niV2 — • • • ) + {.niUz — )(nit)3 — • • • )] 4- («iWs — • ■ • ) 

(niw* — . . . )(niM3 _ . . . )(jiiV3 — ... ) }S(niu, — . . • )Bf 4- E((niUi — ... ) 

4 

(mill — . . . )[(niti2 — . ■ . )B 2 4- (mua — . .. )J?,] 4- (mth — • • • )(nii;j — ... ) 
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— ■ ■ ■ )J5i + (nxUs ~ • • • )P3] (wiWa — • ■ ■ • • • ) 

l(niMi — • • . )J5i 4" (wi — • • • )52]}£i(raiMj — . • • )(niiij — • . . ) 

4 

-f E { {n\Ui — - - • ) — , . . ) (mw2 — • • « ) (niVa — • • ■ ) + (riiUa — • - . ) 

(niua - • • ■) } S(niu^ — . . , )inlV^ - . • • )(niW/ — “ )^y + (niUi — • . )-Bi 

4 

”1“ ' — ■ * ’ )-®2 '4" (^1^3 — * ' ' ‘ * * ) 

4 

{mUj - • ■ •)(niV, -...)+ ES(niu, - ■■■ )(niv, - . • • )(niu, - - ■ •) 

4 

(niv, — • . • )(niUf, — • . • )Bj. (4.14) 

The expansion of (4.14) is not as difficult as it appears for only two subscripts 
can appear in any term; the explicit appearance of the subscript 3 is due to the 
fact that we are dealing with a triple summation. We, consequently, do not 
need to expand those parentheses in which B appears. 

We shall now, without any further details, state the final result, which is: 

,Mpn = n~>(S[nlPU - PioPi, + 3nx(PixPL + PioPlO + 3ni(n? + 2)PI,PU 
~ 3(2nl + DPixPi, + + 6(n? + 3ni - 2 )PJiP(iPM 

- 3n,SPUS(nlPU + PUPL ~ - n?(^Pli)^]i 

+ 3n-M>Sln?(P5aB. + P'uA,) + 2a(PuP{xB, + PUPUA,) 

- ^xiPhP'xxB, + PUpM ~ 2ni(PliPiA + PlxPhA,) 

+ {PUP\<Bx + PjiPJiA,) - 2n,{PUPUB, + PIxPUA,) 

+ iPUP^B, + P'ozPUA)]] + 3n-^{SK(P5,£; + PUA]) 

■\-n,{PUPixB\ + - {PixPioBl + Pl,P\xAl) 

- 2(Pj„B.P(iB, + Pl,A,PUA,) 4- 2nxPlM - 2Pi,B,PUA, 

~ 2P\,A,P{xB, + 2n,PlPi,A,B, ~ 2{PUfA,B,]] 4 - n-^\S[{PUB] 4 - PlzM) 

+ 3{PUM + PUA\B,)]] . iU)» 

Where a = n* 4- 4- 1. 

This formula is shorter and simpler than the formula for although they 

are of the same order. This is due to the symmetry of iMvxx. 

Chapter V. Product Moments of Trivariate and Quadrivariate Populations 

1. Some additional definitions and notation. In this chapter we shall indicate 
briefly how the method of the previous chapters may be extended to populations 

'•.Cf. Biometnka, Vol. XXI, p. 263, formula (19). 
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of more than two variables. We shall do this by deriving some of the simpler 
formulae, corresponding to those of Chapter II, for trivariate and quadrivariate 
populations. 

The notation will be slightly changed in that we shall symbolize the new 
variables by priming the symbols for the variables used in the previous chapters. 
Thus, we shall indicate the A*'* trivariate population by (X^., Y^, X[) and the 
quadrivariate population by (Xj^, Y^, Xi, Yi), and samples from such 
populations by {x^, Vk, ^k) and (a:*, y*, x*, y[) respectively. 

We shall denote by P“ ^ the product moment of the population of order 
% in X, j in Y, and k in X', and by the similar product moment for a quadri- 
variate population These are defined by the following equations: 

PT,fc = E(X^ - aJXY^ - hJ’iXL ~ OS (5.11) 

P:,,j - EiX^ - aJXY^ - bjXXi - cJ^(Y^ - (5.12) 

where (h,, bm, etc. are defined as in Chapter I part 2. 

The sample product moments corresponding to P™ji, P^,*; will be denoted 
by p,)i and p„ki respectively. They are defined by: 

p„k = n-’- S (x^ - xYiy^ - y)’« - a:')*, (5.13) 

m«l 

p„ki = n-> s (z„ - zYivn, - yV(xL -x'fiyL - 2 /')'. (5.14) 

m»" 1 

Finally we shall designate E(p,jk) and E(p,,u) by pi, k and joi/u respectively. 


2. The Mathematical Expectation of pm and p 2 ii. By definition we have 
pm = P[n-i>S(a;, - x)(y, - y)(x' - x')] . (5.21) 

Applying the transformations (1.17) this equation becomes 
nPm = -®[>S(17, -b A,)(V^ + A)(U' -f C,)] = SE(U,V,U-) + SE(U,ViCJ 
+ SE(U^V'B,) 4- SE(V,U'iA,) 4- vanishing terms + SE{AJBfi^. (5.22) 

Since PA, B, (7. = A,BjC„ SE(AtBtCi) = BAiBiCt. Following the previous 
notation we shall put )SA <5 ,Ci = Qm. 

When the expression SE{1J^VJJ\) is expanded, no other non-vanishing terms 
except those of the form E(u,v,u',) == Ph, can appear. The coefficient of this 
term will evidently be the same as that of PJj in (2.23), namely: n~^nini. 
Whence: 


SEiUJiU!,) = n-^n,n,SP{x. 

The three terms following the first of (6.22) are by (2.24) equal to 
n *n 2 (S(PJioC', ^ 
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We thus get : 

Pill — n-^n^niSP[n_ + n~®ns5(PlioC'. + Pioi^i + P'oitA) + . (15) 

With the aid of the formulae of II, 3 we easily find the formula 
PjK == ^{('^1 — l)SPlii -f- (2nl -|- n2)S(PiQa8Poii + 2P iio»S'Piin — PlooPou 

‘-2PUoPln)}+n-^{{nl+mPloA + PliA-\-2Ph^A,)} 

-|- a *{(wi — l);S(Pon'd? + 2P -f- 2P uo-d.P, + PaooPt^i) “h Qaoo'^I^ou 
"h 2Qno'SPioi + 2Qi(])SPJii) -f- QoudPJoo)} + ^ (15) 


3. The Mathematical Expectation of pim. The procedure for finding the 
formula for pnn is very similar to the above. We shall tliereforc merely state 
the result. 

Pnu = ^ ^{(^1 “ l)'^Piin"l~ (2^1 + 'ft2 );S(P iioqPoqh + PioQiPouo 


+ •P’loioPoi.ox) } + ■n ^{(nil 4- l)»S(PuioP< + PnojP, + PJoiiP* + Poiii-d,)} 

+ w *((^1 + l)/S'(P}iooC'tA + Pioio-^A + Pono^A + ■Poioi'-'1,P, 

+ PJou-diP, + PiooiPt^i + 'S(Qoi)U^ uoo 4" QoiojPioio -h QiooiPoiio 4" Oioio-^oioi 
4- QnooPooii 4" QoiiaPJooi) ) 4“ ^ *Qiui • (17) 


Washington TJnivehsitt, St. Louis. 
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We thus get: 


Pill = 'nr^TiiniSPl^i + ^ ^mSiPhoC^ + PloiPi + -Poii-^.-) + w g 
With the aid of the formulae of II, 3 we easily find the formula 
Pya = — 1 )'SjP2ii + (2?ii + ^i^)'S(PJoo'S'f’oii + ^Plio^P loi 

— 2 F 110 PI 01 )} 4" 1)^(^3 01'B» "h PlioG{ + 2PliiA{)] 

+ ~ l)d'(Poiid.f + 2Pio]-d.,P, + 2PJjod.<C'^ 4" PiooP^(^,) 4* 


4 “ 2QijoiSPioi 4 " 2 Qio];SPJio ~i~ Qou^P 200) } 4 " ^ 6211 • 


3. The Mathematical Expectation of pmi. The procedure for fi 
formula for pim is very similar to the above. We shall therefore rm 
the result. 


^1111 = ^ ^{(^1 — -j~ ( 2 W'i 4 " tij)/S(PJjooPooii 4 ~ -Piooi-^oiio 

4" -PioioToioi) } 4“^ 4" l)^(^iiio-D, 4- 4- -Pioii-®* 4“ 

4 - n ^{(nf 4- l)'S'(PiiooC<P, 4- PioioP<-^^i 4- TJuoA.D^ 4“ ■Poioi4.,Cj 
4" PooiiA,B, 4“ Piooi-®»C^« 4" 'S(Qoou-Piioo 4" QoioiPioio 4* Oiooi'^oiio 4“ 


+ QiiooTooii 4- Qouo-Piooi) } 4" ^ ^Qiiu • 
Washington University, St. Louis. 



AN APPLICATION OF ORTHO GONALIZATION PROCESS TO THE 
THEORY OF LEAST SQUARES 

Bt Y. K. Wong 

Introduction 


The present paper is an outgrowth of the writer’s attempt to fill a lacuna in the 
discussion of the Gauss method of substitution as given by many writers. For 
illustration, let us cite Brunt’s Combination of Observations. In Chapter VI, 
we find: 

Let the normal equations be 


[aa\x 4- [ah]y + [ac\z — [al] = 0 
[bb]y 4- [hc]z - [bl] = 0 
[cc ]2 — [cl] = 0 . 

From this equation we find 


X 

Substituting, we obtain 


_ N 2 4- M 

[aa] ^ [aa] [ao] ’ 


where 


[bhl]y + [bcl]z - [611] = 0 
[ccl]z - [cll] = 0 


(i) 


(ii) 


(iii) 


[661] = [66] - [a6] [a6]/[aa], etc. (iv) 

From the first equation in (iii). 


2 / = - 


[bcl] [6Z1] 

[661] [661] • 


(v) 


In connection with equations (ii) and (v), the question naturally arises as to 
whether or not these numbers [aa], [661], • • • are all different from zero. Since 
[oo] = 2o,aj, one can see that [ao] 0 if o, 0 for every i. However, to show 
the non-vanishing of [66.1], [cc.2], etc. is by no means simple. Many writers do 
not give a demonstration on this point. We know that a system of non-homo- 
geneous linear equations has a solution if the system of equations is linearly 
independent. Brunt gives a discussion of the independence of the normal equa- 
tions in Chapter V, Art. 3fi, but he does not state clearly a condition for inde- 
pendence. He says: “The condition of independence is in general satisfied in 
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the problems which arise in practice. We can then proceed to the formation 
and solution of the normal equations.” It is one of the aims of this paper to 
give a necessary and sufficient condition for the independence of the normal 
equations and to show [an], [65.1], etc. are all different from zero when the condi- 
tion is satisfied. 

In the theory of least squares, there is the classical method of the derivation of 
normal equations by an application of the notion of minimum in differential 
calculus. After the normal equations are secured, the Gauss method of substi- 
tution is applied to obtain the solution. Doolittle modifies the Gauss method of 
substitution so as to facilitate the labor of computation. However, when the 
number of parameters (or unknowns) exceeds 4, Doolittle’s method is quite 
complicated. In the present paper the writer wishes to present a mathematical 
discussion of a method obtained through an application of the Gram-Schmidt 
orthogonalization process. This method furnishes us a new procedure for deter- 
mining the most probable values of the parameters (or unknowns). The formu- 
lation of the system of normal equations will be omitted in this new procedure, 
which IS particularly effective in fitting curves to time series. The paper can 
be roughly divided into three parts. The first part gives an algebraic derivation 
of the normal equations. The second part derives a condition for a set of 
observation data so that the Gauss method of substitution is applicable. The 
third part gives a relationship between the Gauss method of substitution and the 
orthogonalization process. A practical application of the results of this paper 
will be found in a later paper. 

The process of orthogonalization has been used in the 19th century, and has 
been applied extensively in the theory of integral equations and linear trans- 
formations in Hilbert space. In classical analysis, if <P 2 (x), ■ • • , defined 
on (0, 1), is a normally orthogonalized system, and if f{x), defined on (0, 1), is 
such that/® is Lebesgue integrable, then the system of Fourier coefficients 

/r = ^ f{x)vr(x)dx (r = 1, 2, - • •) 

has certain interesting properties, one of which is that 

\ m 

- (/(*) - ^M)® dx = 0. 

The preceding notion has a close connection with the theory of least squares as 
outlined in many texts on statistics. In section III, the reader will find how 
this notion is applied in the derivation of the normal equations. Since the 
number of dimensions is finite, the integration process reduces to a summation 
process and furthermore no limiting process is used. This new derivation of 
normal equations has the advantage that (1) differential calculus is not used, 
(2) a new form of normal equations is obtained, (3) the solution of the unknowns 
or parameters can be immediately obtained without further application of the 
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Gauss Method of Substitution or the Doolittle Method, and (4) the formula for 
the “quadratic residual” is obtained as a simple corollary. 

From the results in section III, we see immediately what condition should be 
imposed upon the set of observation data so that the Gauss method of substitu- 
tion may be applicable. In section VI, we find a necessary and sufficient condi- 
tion for the independence of the system of normal equations (3.9), and also the 
fact that when this condition is fulfilled, then, due to the special nature of the 
coeflffcients of the unknowns, we see that the matrix is properly positive. It is 
on account of this fact that we are able to show that the numbers [aa], [6&.1], etc. 
are all different from zero. The demonstration of this point is found in section 
VII. In this section, we lay down a fundamental hypothesis for Gauss’s method 
of substitution, namely, the set of observations A* = (a.i, • ■ • , a,„) i = 1, 
2, • • • , r, is linearly independent. Lemma 7.3 may be called the fundamental 
lemma for Gauss’s method of substitution. Some interesting properties of the 
numbers [A,Afh], where s, i = 1, • • • , r, and h is less than the smaller one of 
(s, t), are demonstrated. 

From the properties of the numbers [A,Afh], where s, t — 1, • ■ ■ , r and h is 
less than the smaller one of (s, t), and in comparison of the system of equa*- 
tions (3.7°) with the final form of equations obtained through the application 
of the Gauss method of substitution, we can see the relationship beWeen the 
Gauss method and the Gram-Schmidt orthogonalization process. If we should 
like to give some credit to Gauss, we may say that the orthogonalization proc- 
ess was known by him, but was stated in a different form. 

The writer wishes to remark that certain theorems together with proofs in 
section II, IV, V and VI are obtained from E. H. Moore’s lecture notes. How- 
ever the writer should be responsible for any defect. Finally, I should empha- 
size that the use of the notion of positive matrices is only for convenience. 

I. Vectors, Inner Products, and Linear Independence 

In this paper, we shall consider vectors of the form* 

(1.10) (»!, r> 2 , • • • , v„) . 

For convenience, we shall use capital letters to denote vectors of the type 

(1.10) . 

Let V = (t>i, , «^„) and U = (mi, z/j, - . . , m„), then we say F = ?7 if 

Di = Wf for every i. 

We define F 17 by 

(1.11) F -f- 17 = (Vi 4^ Ui, Vj -|- Us, ... ,^71-1- Rn) ) 
and sF, where s is a number, by 

(1.12) sF = (stJj, svs, ' • • , sv„) . 

* If we write vt as w(z), where i = I, 2, •••, n, then v(i) may be considered as a function 
of one variable whose range consists of a set of positive integers, (1, 2, • . - , n). E. H. 
Moore defines a vector as a function of one variable. 
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Hence, s7 = Fs. In particular, when s = '-1. we shall put — 7 = (—1)7. 
Then 17 — 7 becomes a special instance of (1.11) and (1.12). 

From (1.11) and (1.12), we see that addition is commutative and associative. 
Inner Products: The inner product of two vectors 7 = {vi, • • • ,v„) and 
[7 - (ui, • • • , Un) is defined* to be 

(1.2) (7, 17) - 

1 

The norm of a vector 7 is defined by n (7) = (7, 7) ; and the modulus of a 
vector 7 is defined by mod (7) = -j- -\/ n(7) . 

From (1.11), (1.12), and (1.2), we can easily prove the following theorem; 
Theorem 1, The syvibol ( , ) has the following properties: 

(S) ([7, 7) = (7, U) for every 7, U ; (symmetric property) 

(L.) (s7, U) ~ 8(7, U) = (7, aU) for every V, U and every number s; 

{L+) (17, (7 4- IF)) = (C/, 7) + (17, T7) for every U,V,W; {Unear property) 
(P) (7, 7) ^ 0 for every V ; {positive property) 

(Po) (7, 7) = 0 t/ and only if V is a zero vector) {properly positive property) 
Linear Independence. A set of vectors 7i, • • . , 7, is said to be linearly 
dependent in case there exist constants ci, . . • , Cr not all equal to 0 such that 

Ci7l -)■••• -f" CrV r — 0 , 

where 0 is a zero vector. 

A set of vectors 7i, • • • , 7, is said to be linearly independent in case, if the 
constants ci, . ■ • , c, satisfy 

ci7i 4- ... 4- c,7r = 0, 

each constant a = 0. 

Theorem 2. If the set 7i, • • • ,7, is linearly independent, then none of the 
vectors is a zero vector, and hence the norm of every vector must be different from zero. 

For if 7, is a zero vector, then set c, = 1, and c, = 0 for ^ s. It is obvious 
that 

0.7i4- 4'0 ''F.-i 4-1-'F. 4-0-7.+1+ 4- 0-7r = 0, 

which show that the set of vectors 7i, • • • ,7, is linearly dependent, contra- 
dictory to the hypothesis. 

A more general theorem is stated in 

Theorem 3. If the set 7i, • • • , 7r is linearly independent, then every subset^ 
is also linearly independent. 

We shall prove this theorem by a contrapositive form. The contrapositive 
form is as follows: If in the set Vi, ■ ■ ■ , 7r, there exists a subset which is linearly 

* The notation ( , ) was introduced by D. Hilbert, In treatises on least squares, the 
notation [ ) is used The present writer reserves the latter notation for other purposes. 

’ Consider a set of integers (1, 2, • ■ • , n). Then any cotnbination of this set of n distinct 
integers taken r ^ n at a time is called a stPfsel of the set (1,2, ... , n) . Likewise, w'e call 
any combination of the set of vectors Vi, Fj, • . . , F„ taken r £ n at a time a subset of the 
whole set. 
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dependent, then the whoh set is also linearly dependent. Without losing any 
generality, let us suppose the subset Fi, • • • , F, (s g r) to be linearly depend- 
ent. Then there exist ci, • • • , c, such that 

ciFi -f- • • • -+- c,F. = 0 . 

If s = r, then the whole set is linearly dependent. If s < r, then let c, = 0 
for f = s — 1, s — 2, • • . , r. Then 

ciVi = 0 . 

1 

which shows the whole set is linearly dependent. 

Theorem 4.'* A necessary and sufficient condition for the set F, = («,i, • • • , v^), 
i = I, r to be linearly independent is that there exists a norh-vanisMng deter- 
minant of order r in the array 

*^11) Pl2, • - • > t^ln 
Dll, Vai, • • • ) 


^rl} Vf2, • * ■ j 

II. Gram-Schmidt’s Orthogonalization Process ' 

For the present section and the sequel, we shall adopt the notation A , — 
(a%\, • • ‘ , Utn), = (hti, * * * , hxn)) and Gx “ (cn, • • ' , Cxf) for f — 1,2, ••• jV. 

Theorem 5. For every set of vectors A^, • • • , A^, there exists uniquely a set of 
vectors Bi, ■ • • , B, such that 

6.1) B.) = 0 (f ^ s). 

5.2) For every t satisfying the relation 1 ^ t ^ r, then At is a linear combina- 
tion of Bi, ■ • • ,Bi-, and Bt is a linear combination of Ai, ■ • • , At. 

6.3) Bx — Ax] and for t > 1, (Bt — At) is a linear combination of 
Bi, • • ■ , Bt~x, and is also a linear combination of Ax, • • ■ , ilt-i. 

5.4) If t > 1, then (4,, Bt) = 0 for every s < t. 

6.6) (At, Bt) = (Bt, Bt) = (Bt, At) for every t. 

To prove this theorem, let us define 

Bx = Ax, 

Bi = Aa if n(Bx) = 0 

(2.1) ’=* At — Bi if n(Bx) 9^ 0 


Bx 


At ~ 


E 


hfiBi 


(1 S t S r) , 


* See Dickson, Modern Algebraic Theories, p. 65; Bocher, Higher Algeibra, p. 86. 
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where 

( 2 . 11 ) 


ht, = (At, Bi)/n(B,) , if n(B,) 0 , 


= 0 , 


if n(Bi) = 0 . 


We proceed to show that this set has the properties stated in the theorem. 

To prove 5.1), let us suppose t < s. This assumption is permissible since the 
operator ( , ) has the symmetric property. First, if 4i = 0, then Bi = 0, and 

(Bt, B,) = (Ax, Ax) == (0, AO - 0 , 


Secondly, if Ax ^ 0, then Bi 0 and 


(Bi, Bi) == (Ai, Ax — hi, Bi) = (Ai, Ai) — (Ai, Bi) 


(Ax, Bx) 
n(Bx) 


= (Ai, Ai) — (Ai, Ax) (Ai, Ax)/n(Ax) ~ 0 . 


Assume 5.1) is true for < = s — 1, then 

(Bt, B.) == (st, A. - Z - (Bt, A.) - S K(Bt, B.) . 

The sum on the right hand side reduces to Ki(B,, Bt), since the other terms 
vanish by assumption. Now if (Bt, B,) 5 ^ 0 then by (2.11), h,t(Bt, Bt) = (A„ B,), 
and by the symmetric property of ( , ), we obtain 

(Bt, B.) = (Bt, A.) ~ (A., BO = 0 . 

If (Bt, Bt) = 0, then by the Po-property of ( , ), we find that B< is a zero 
vector, and hence (B„ B,) = 0. 

5.2) follows from the definition of Bt. 

That (At — Bt) is a linear combination of Bi, • • • , Bt~x for i > 1 follows 
from the definition of Bj. Since B, is a linear combination of (Ax, • • • , A,-0, 
we secure the second part of 5.3). 

a 

By 5.2), we can determine g„ such that A, = YL Thus for every 

X 

8 < f, we have by 5.1) 

(A„ BO « P..B., == i M, Bt) = .0 

f*-! 

By 5.3), there exist p,.- such that A, — B« = $0 and hence A< = B, + 

.-1 

2 Thus by 5.1), we have 

(A„ Bt) = (^Bt + 2 gt,Bt, = (Bt, Bt) + 2 gt,(Bi, Bt) 

= (Bt, Bt) . 

By the symmetric property of ( , ), we secure (At, B*) = (Bt, B,). 
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For the proof of uniqueness, let us suppose there exists a second set of vectors 
B[, ,B'r having the properties 5.1), 5.2), 5.3), 5.4), and 5.5). By 6.3), we 
see that Bi = Ai = B[. Assuming the uniqueness holds true for r = we 
proceed to show that it is also true for r = i + 1. By 5.3) there exist con- 
stants s„ s[ {i ~ 1, ■■■ ,t) such that 

t 

Bt+i = 52 s,A, 

I 

= -^<+1 + SjA , . 

1 

Thus 

Bi+i - B't+i = XI (s. - • 

1 

From this, we secure 

(Bj+i — Bt+i, Bt+i — B — B ,+i, X 

= X — s«)‘(-Bt+i — B A,) = 0 , 

1 

by virtue of 5.4). Hence by Po-property of ( , ), we have B,+i — Pj+i = 0 
and hence Bt+i = 

The set Bi, ■ ■ • , Br with the properties stated in Theorem 5 is called the 
ortkogonalized set of Ai, • • ■ , A,. This process is called Gram-Schmidt’s orthog- 
onalissation process. 

The set Pi, ■ > • , P, is called the normally ortkogonalized set of Ai, • • • , A, in 
case the former set enjoys the properties 5.1), 5.2), 5.3), 5.4), and if 

5.5n) (At, P() = (P(, Pt) = (Pi, Ai) = 1/or every t . 

Theorem 6. If a subset A^^, ■ • • , Ai„(l ^ fci ^ • g g r) fn the set 

Ai, • - • , A„ IS linearly independent, then there is a subset Bki, ■ • • , B^m which 
has the properties stated in Theorem 5, and it is also linearly independent. 

Let h = km — hi 1. To prove the theorem, we may assume ki, km to 
be 1, ■ ■ • , ft g r, for otherwise, we may renumber the vectors. We construct 
the P vectors in the same way as given in equation (2.1) and (2.11). By 
Theorem 5, we have 

(2.2) Pi = Ai, P. = A. -j- X (s = 2, . . . , ft) . 

1 

Suppose the constants ci, ■ • • , c* be such that 


ciPi “h • * • d" ChBh == 0 . 
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Then by (2.2), we secure 

0 == Ci.i4.i -f- Xj ~ + 22 { ^» + 22 ) 

2 2 \ 1 / 

= (ci + C2ff2i, + ■ ■ • + Chgiii) At + (c2 + C3^32 + ‘ • ■ Chgii2)A2 + • • • + ^^*4* . 

Since j4i, • • ■ , Ah are linearly independent, we have 

Cl “■ C2^21 — ■ ' ‘ ^h^ht — 0 , 

C2 — • • • — Ch^hi = 0 , 

(2.3) 


Cft = 0 . 

But the determinant of the coefficients of Ci(f = 1, ■ • • , A) is 


1 

Oil 

gai 

• • • ghi 

0 

1 

gn 

• • • ghi 

0 

0 

0 

... 1 


Hence by a theorem in the theory of equations,® the only solution that satisfies 
(2.3) is that fci = fcj = . . . = hh — 0. Thus the subset Bi, • • • , Bh is linearly 
independent. 

CoEOLLAKY. The orthogonalized set Bi, • •• , Br is linearly tndependent if and 
only if the set At, • • - ,Ar is linearly independent. 

Thboeem 7a. If a set of vectors At, ■ ,A,is linearly independent, then the 
sef can he normally orthogonalized. 

Let B, be the orthogonalized set of A ,. Since .4 , is a linearly independent set, 
then the set B, is also linearly independent by Theorem 6. Hence by Theorem 
2, the norm of every vector jB< is non-vanishing. Define Ci = B,/mod (B,). 
Then this set C, enjoys the properties 5.1), 5.2), 5.3), 5.4) and 5.5n). 

Thboeem 7b. If a set of vectors, Vt, ■ ■ • , Vris normally orthogonal, i.e, if 


(2.4) 


(.Vi, Vi) 


(i = f) 
jo (i 7^ j) , 


then Vt, ■ • • , Tf linearly independent. 

For suppose 

CtV 1 • d" CrV r = 0 . 

Then 


i:c.(Vi,V,-) = 0, {i = l,2, ...,r). 


Dickson, First Course in the Theory of ^gualions (1922), p. 119. 
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By condition (2.4), the preceding ejyression reduces to 

C) = 0, (^ = 1, 2, • • • , r) , 

which shows the linear independence of Vi, • . . , 7,. 

III. Algebraic Derivation of the Normal Equations 

Consider a linear function 

r 

(3.1) I = ViXi + piXi 4- . . . + = Y, ■ 

1 

Let the set of observations of Xi and I be 

(3.2) A 1 = (oii, * ■ * , Utn), .L — (Zj, * ■ • , In) (^* “ i; ' * ‘ j r j 11 ^ r) 
respectively, then the residual w,- is 

r 

Vi « ^ j ^ Ztj (f, — 1, * ' ' > n.) . 

j-i 

In vector notation, 

y = S pAi ~ L . 

3-1 

The theory of least squares requires us to find the values for Pi, • ■ • ,Pr so as to 
make (7, 7) a minimum, or 

(3.3°) Y1P}-^{ — L) = a minimum. 

Let Ai, • • ■ , Ar be linearly independent. By Theorem 7, the vectors Ai, • • • , A,, 
can be normally orthogonalized. Let Ct, •• • , Cr be the normally orthog- 
onal set. Then every A( (Z = 1, • • • , r) is expressible as a linear combination 
of Cl, ■ • • , C(. Let us wnte 

(3.3) ± pfAi = ZkC,. 

1 I 

Our problem now is equivalent to that of finding the values A; j(i = 1, • • • , r) so as 
to render the iimer product 

(3.4) {HkCi - L,Y:kCi - L) 

a minimum. Expression (3,4) can be written in the form 

(L, L) - 2 £(L, C.)ki 4- D (JkiCi, k,Ci) 

(3.5) = (L, L) - 2 X (L, Cdk. + 

= (L, L) - r (L, C.)* 4- 2 (*< - (Ci. L})K 

Hence (3.4) gives a minimum if and only if the last summation vanishes, i.e,, 

(3.6) ki = (C., L) (f = 1, . . . , r) . 
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The Bessel’s inequality 

t k] ^ (L, L) 

1 

is obtained from (3.6), (3.4), and (3.5). 

To solve for Pi, we make use of (3,3) and (3.6), and secure 

22 = 22 (C*) > 

1 1 

whence 

(c„ t i L) C.) . 

On the right hand side we have 

{Ck, '£iC„ L)Ci) = 22(^^.-, ^ (C,, C,) = (Cu, L ) , 

since (0*, C,) = 0 when i ^ k, and {Gk, C,) = 1 when i = k. On the left hand 
side, we have 

(Ck, 22 = 2 (Ck, A,)pi = 22 (C'/fe) A,)pj , 

since (C*, i.,) *= 0 when j < fc. Hence the values for pi, ■ ,Pr are given by 

(3.7) 22 (Ck, AOpi = {Ck, L) (A; = 1, . • . , r) , 

i~k 

where (C,, A,) - (C,-, C.) = 1. 

Equations (3.7) are called the normal equations, which are derived without 
using any notion in differential calculus. 

From (3.6) and (3.5), we secure the value for the ‘quadratic residual’ (T, 7) : 

(3.8) (7, 7) = (L, D - ± (L, C,)% 

*=■1 

which is a positive quantity by virtue of the Bessel’s inequality. 

Let Bi, . • . , B, be an oVthogonalized set of Ai, • • > , A,. Then every vector 
Bi has a non-vanishing norm, and Bi = mod (B.) C*. Hence from (3.7) and 

(3.8) , we have 

(3.7“) 22 (Bk, Ajp. = (Bk, L), (fc == 1, 2, - • . , r) , 

«*-fc 

(S-S") (7, 7) = (L, L) - 22 {B, B,y/n{B,) . 

Thus we have proved the following 

Theobem 8. Given a linear fundtoti (3.1). Let the set of observations of x, 
andlhe 

Aj = (oji, ■ . ■ , a»ii), L = {li, - • • , In) (t = 1, . . • r; w s r) 
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respectively. Let Ai, • • • , Ar be linearly independent/ Bi, • • • , Brbe the orthog- 
onalieed set, and Ci, • ■ ■ ,C,, the normally orihogonalized set of Ai, ■ • • , A,. 
Then the set of values Pi, • ■ ■ , Pr will minimize (3.3°) if and only if the system of 

r 

equations (3.7°) or (3.7) holds true; in other words, ^ p,A, — L is orthogonal 

\—X 

to C, or to Bjfor every j. The quadratic residual (V, F) is given by (3.8°) or (3.8). 

From (3.7), we can secure the solution for Pi, • • • , Pr immediately without 
further application of the Gauss method of substitution. 

The proof of the following theorem does not make use of the orthogonalization 
process.® 

Theorem 8°. Let F = S piA„ where every A, is not a zero vector. The set of 
values pi, • • • ,pr will minimize (3.3°) if and only if (F — L, Af) — 0 for every 
i, i.e., F — L is orthogonal to every A,. 

The condition is necessary. To prove this, we show that if (F — L, d. ,) ?^ 0 
for every i, then we can find another set gi, ■ . . , g, such that n(F — L) > 
n{G — L), where (? = S q,A,. For if (F — L, Af) 0 for every i, then we can 
find a vector A, such that (F — L, d.,) 0. Since A, 7 ^ 0, we let e = 

(F — L, A,)/n{A,) and G = F — eA, = S q,Ai. Then 

n(G - L) = n(F - eA, ~ L) ^ n(F - L) - (F - L, A.f/niA,) , 

which shows that n{G — L) < n (F — L). 

To prove the sufficiency, we show that for every set qi, • • • , qr different from 
Pu ■ • • iPr then n(,G — L) > n(F — L), where G = S q^Ai. Let s,- = g, — 
and fif = S s,d,. Then G = F H. Now if (F — L, d.) *= 0 for every i, it 
follows that 

(F - L, H) = Z (F - L, d.)s. = 0 . 

» = i 


Thus 


n(G - L) = n(F - L) + MH) . 

Since n(H) > 0, we have n(G — L) > n(F — L). 

The preceding theorem does not require the linear independence of the 
vectors di, • - • , dr. By Theorem 7a and 7b we see that it is necessary and 
sufficient for the set At, , d,. to be linearly independent in order to solve the 
equations (F — L, A,) = 0, (t = 1, 2, • ■ ■ , r), or 

(di, di)pi + (di, d 2 )p 2 -!-••• + (-^li )Pr = (di, L) 

(3.9) 

(dr, dj)pi + (.Ar, d2)p* "f* • • • ~l“ C-^r, Ar)pr = (dr, L) . 

' The proof is based on the same type of reasoning as used by Jackson. See Dunham 
Jackson's Theory of Approximation, pp. 161-152. 
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If 4i, • • • , At are linearly independent, the conclusion in Theorem 8° can be 
deduced from Theorem 8. For by Theorem 7a,) Ai = 23 SuCt, and hence 

t 

{F - L, At) = (F - L, Z = E s.«(F - L, CO == 0 . 

Also, Theorem 8 can be deduced from Theorem 8®. 


IV. Matrices and Their Reciprocals 
An ordered array of numbers of the form 




ttll, Ctl2, • • 

• Aim 

(4.1) 

a = (a.,) = 

On, • ■ 




dnlj dn2; • • 

• dnm 


is a matrix. If we write a(t, j) = an, then the array of numbers (4.1) may be 
considered as a function of two variables i, j on the ranges of positive integers 
(1, 2, • • • , n), (1, 2, • • ■ , m).’ Thus a vector is a special instance of a matrix. 
We shall use Greek letters to denote matrices throughout this paper unless other- 
wise specified. When n — m, i.e. the number of rows is the same as the number 
of columns, we have a square matrix. Associated with every n-row square 
matrix, K, a determinant can be defined, and for simplicity, we shall adopt the 
following notation ; 


D{k) 


dll * • • din 


dnl • • • dfxn 


An identity matrix, denoted by 5 = (d*,), is a square matrix of which the 
elements in the principal diagonal are 1 and elsewhere 0, i.e. d„- = 0 (i 5 ^ j), 
di, = 1. A zero matrix, indicated by a, is one such that every one of its ele- 
ments is 0. The transposed matrix, a', of a is formed by interchanging the 
rows and columns. We say two matrices a = (a,,) and ^ = (6„) are equal in 
case flj, = bi, for every i, j. A matrix a is symmetric in case a' = a. The 
column of a is indicated by a(., i), the 1 “* row of by fiih ■) and the element in 
the i*''' row and column by a (i, f). Hence a(i, j) = at,. 

Addition: Let a be a matrix given by (1) and )3 = (&„•) a matrix of the same 
number of rows and columns as a. Then 


a -H P = (a„ -b b„) . 


’ E. H. Moore defines a matrix as a function of two variables 
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We note that a + = |S + If 7 is a matrix of the same number of rows and 

columns as a, then (« + |3) 7 = « -f (/3 + 7 ). 

Multiplication: Let a = (a<,) be defined by (1), and /S = (&,•*,) be a matrix 
of m row and r columns, then the product tt = a|3 is defined by 



Thus TT is a matrix of m rows and r columns. 

The multiplication of two matrices is not necessarily commutative. 

If a is a matrix of n rows and m columns, /3 of rows and r columns, and 7 of 
r rows and s columns, then a(/ 37 ) = (afi)y. If a is a matrix of n rows and m 
columns, and /3, 7 are matrices of m rows and r columns, then «(/? 7 ) = 

a/3 4- ay. 

Scalar Multiplication: Let s be a number, and a be a matrix of n rows and 
m columns, then 

s-a = (sa,i) = a- s. 

Let S, denote a square matrix of n rows in which the elements in the principal 
diagonal are s, and 0 elsewhere. Then 5, = s5, where 6 is an n row identity 
matrix. We note from the associative law of multiplication that 

sa — S,- a = a • 5, . 

In particular, let s = —1, then we have —la. For convenience, we write 
—a — —la. From the definition of addition, we obtain a definition of sub- 
traction for two matrices of the same number of rows and columns. 

Reciprocals of Matrices: Let a be a matrix of n rows and m columns. 
Then a matrix of m rows and n columns is said to be a reciprocal of a in case 

a‘a~^ = d" , and a~^-a = 5”*, 

where 5", S’" are identity matrices of order n, m respectively. If a matrix « has 
a reciprocal a~^, we can prove a"‘ is unique. It can be shown that when a has a 
reciprocal, it must he a square matrix.^ 

A matrix is said to be non-singular in case it has a reciprocal, otherwise it is 
said to be singular,® It is evident that every zero matrix is singular, and an 
identity matrix is non-singular. 

Suppose a is a square matrix of order n. Let us denote the cofactor of the 
element a„ of a by e„. Then 



is called the adjoint matrix of a. 


* For the proof of this statement, see Moore, Vector, Matrices, and Quaternions. 

* This definition is due to E. H. Moore. 
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If a is symmetric, then e is also symmetric. Since a.iCij' a,„e„, = 

Z)(a) or 0 according as i = j or i ^ j, we secure the following: 

Theorem 9. Let a be a square matrix and e its adjoint, then 

at = ea ~ [Z)(a)l5 . 

Theorem 10. If the determinant of a is different from zero, then there exists a 
reciprocal a~^, and = adj a/D(a). 

This theorem follows from theorem 5. 

The converse of Theorem 6 is also true. 


V. Symmetric Matrices of Positive Type“ 

Let a ~ (flti) be a matrix of n rows and m columns; and let a — (fcj, . • ■ , k,,) 
and p = (h, ■ ■ ■ , hm) he integers among the sets (1, • ■ • , n) and (1, ■ ■ ■ , m) 
respectively. The subsets a and p may be equal to the whole sets (1, ■ • • , w) 
and (1, • • . , m) respectively. Then 


(3) 


a(<T, p) — 


O&lM • 


dknhl 

^knkm 


is called a minor of a. In notation we write this minor as a(a, p) indicating the 
ranges to be c and p. 

The minor a(— a, — p), which is obtained by striking out all the (i = 1, 

• • ■ ,m) columns and (j = 1, • , m) rows from a, is called the com- 

plementary minor of a (o, p). 

If a is a square matrix of order n, then a{a, a) is called a principal minor of a. 

Let a and (3 be matrices of n rows and m columns; and let o-, p have the same 
meaning as above. Then a{a, p), p{a, p) are called corresponding minors in 
a, p respectively. 

A symmetric matrix a = (a.,) of order n is said to be of positive type in case 
the determinant of every principal minor of a is positive, and is said to be of properly 
positive type in case the determinant of every principal minor of a is greater than 
zero. 

CoBoiiLARY VI. Every element in the principal diagonal of a positive, sym- 
metric matrix is positive. 

For, let 0 consist of a single integer i, then a{o, o) « an ^ 0. 

CoRoELART V2. If a symmetric matrix is properly positive, then every element 
in the principal diagonal is greater than 0. 

Theorem 11, If a symmetric matrix « of order n is {properly) positive, then its 
adjoint matrix e is also symmetric and {properly) positive. 


“ We follow the terminology of E. H. Moore. Moore developed this notion quite 
extensively. 
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The symmetry of e is evident. Let <r be a subset of (1, • • • , n) and let p be 
the number of integers in a. Consider any principal minor e(a, a) in the adjoint 
matrix e. By a theorem in the theory of determinants, we have^^ 

DHa, c)] = -a)].[Z)(a)]P-S 

where k is an integer depending on the set a. By hypothesis a is positive (prop- 
erly positive); hence D[a{~a, — <r)] and are positive (greater than 0), 

and it follows that D[t{a, a)] is positive (greater than 0). 

Theorem 12. If a synmietnc matrix ts properly positive, then D(a) is different 
from zero, and a has a reciprocal ar^, which is also symmetric and properly positive. 

For take <r to be the whole set (1, • • ■ , ra) in the definition of proper positive- 
ness, and we see that jD(a) 0. The theorem now follows from Theorems 10 
and 11. 

VI. Gramian Matrices 

In this section, we shall study the matrices of the normal equations (3.9). 
The main result is that if the set of observations 4i, • • • , is linearly inde- 
pendent, then the matrix (called Gramian matrix) is properly positive and has'a 
reciprocal which is also properly positive. ^ 

Theorem 13. Lei Ai, be a set of vectors, and let Bi, ■ • • , Br be the 

ofthogonalized set of vectors. Then the matrix 

/{Ai, Af) • • • {A\, Ar) 

(6.1) f(ii, ... ,1,) = 

\(d.r, A{) • . . {A„ Ar) 

has the following properties: 

13.1) symmetry 

13.2) D[f(.4i, . . . , dr)] = n{Bi)n{Bt) • • . n(J5,), 

13.3) positiveness. 

A matrix of the form (6.1) is called a Gramian matrix. 

In fact, the symmetric property follows from the fact that (d„ A,) = (d,-, d,) 
fof every i, j. 

We shall prove 13.2) by induction. For r = 1, we have by Theorem 5 
(di, di) = (Bi, Bt) = niB^) . 

Assume the equality is true for r — t, we shall show it is true for r = i + 1. 
The (i d* l)-row determinant is as follows; 

(di, di) • • • (di, At) (di, df+i) 

(6.2) I>[f(di, . • . , d,)] = ... 

(di, d(.^) — (dj, di^-i)(.^t+i) ■dt+.i) 

In case a- = (1, •••, n), -~ir is a null class A (a class wliicli contains no element); then 
we define D[a(—ir, — tr)] = 1. For the proof of this theorem, see Bocher, p 31. 
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By Theorem 5, there exist constants s,(« = 1, • • • , t) such that 

I 

A(+i = Bi^i -h 2 ®i-^> • 

Substituting this value into the last row, we find the element in the column is 
(^i, A(+i) = (A^, Bt+i + X) ■^3 ) ~ Bt+l) + X) -4,) 

\ J-l / 3=1 

(i = 1, • • ■ , t 1) • 

The second term on the right is a linear combination of the first i elements in the 
i* column of the determinant (6.2) and hence by the theory of determinants,*® 
we secure 



(At, At) . 

• ■ (^li, Af) (At, Ai+\) 

B[i(Ai, ■ • • , 4(4i)] = 

(ill, At) 

1 (At, Bt+t) • 

• • (At, A() (At, A(+i) 

• • (At, Bt+i) (Aj+i, Bt+t) 


By Theorem 5, we find that (A<, Bi+i) = 0 for t = 1, • • • , <, and (Aj+i, Bt+i) 
= (Bi+i, Bt+i), and hence the preceding determinant reduces to a form in which 
the first t elements in the (t + I)*** row are zero. Thus 

(At, At) • > • (At, A() 

• • • , 3l(+i)] = n(Bt+i) 

(^li, At) • • • (At, A^ 

= n(Bi)n(Bi) • • • n(Bt)n(Bt^i) 

which proves 13.2). 

Consider any subset a = (h, • • • , hm) of the set (1, ■ • ■ , r). By the same 
argument as above, we find that the determinant of any principal minor 


Aj,j) • 


(■^hm) Ajttj • 



— 'u(Bjji) • • • n(Bicm) • 


By Theorem 1, the number on the right is positive. Thus the matrix t is 
positive. 

Theobbm 14. The following three assertions are equivalent : 

14.1) the set At, ■■■ ,Ar is linearly independent; 

14.2) the Gramian matrix (6.1) is properly positive; 

14.3) The determinant of the Gramian matrix (6.1) is different from zero. 

We shall prove that 14.1) implies 14.2); 14.2) implies 14.3); and 14.3) implies 
14.1). We thus prove the three statements are equivalent. 


>* Dickson, First Course in the Theory of Equations (1922), p. 113. 
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Let Bi, ■■■ , Br be the orthogonalized set of the set Ai, ■ • • , Ar. Since the 
set ill, • • • , is linearly independent, then every subset 

• ■ • > ^ h ^ g ^ r) 

is also linearly independent, and hence n(5t,) > 0 for z = 1, 2, • • • ,m. By the 
same argument as given in the demonstration of Theorem 11, we find that the 
determinant of any principal minor (6.3) is greater than zero. This proves the 
matrix (6.1) is properly positive. 

If the matrix (6.1) is properly positive, then by Theorem 10 the determinant 
of (6.1) is different from zero. 

To prove 14.3) implies 14.1), suppose fc,(z = 1, • ■ • , r) are such that 

kiAi * * ' "H Ar ^ 0 . 

Then 

(kiAi • 4" krAr, A,) = ki{Ai, A,) -t" • ■ ■ 4” kr{Ar, Ai) = 0 

for z = 1, • • • , r. Since (Av, A,-) = (A„ A,), and D(f) 0, the set of con- 
stants kt must be all equal to 0,^® 

From Theorem 14, and Theorem 10, we may state the following 
Corollary; If the set of observations Ai, , Arts linearly independent, then 
the Gramian matrix f has a reciprocal which is properly positive. 


VII. Gauss Method of Suhstitution 

Lemma 7.1) Let tp = (s„) be an r-row symmetric matrix such that Sn 0. 
Then there exists an r-row square matrix r whose determinant is unity such that 
if = (r„) = Tip has the following properties: 

a) rii = 0/or z > 1, and ru = sufor every z; 

b) the first minor of rn is symmetric; 

c) the determinant of every principal minor in yf of the form 



fill Siti • 

• Slkm 

<7.1) 


■ 





(2 g fcj ^ ^ £ r) 


is equal to fhe determinant of the corresponding principal minor in <p. 

To prove this lemma, let us define 

(7.2) r=54-^’i-A, 

where Z>i is the first row of an r-row identity matrix S, and jPi(1) = 0, 

Fi(zz). = -si»Aii (ra > 1). 

(Thus J'lDi is an r-row square matrix in which the first column is Fx and every- 
where else 0.) It is clear that r thus defined is a square matrix of order r, and 


See footnote 5. 
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D(t) = D{5 FiDi) — 1. By multiplication of these two matrices, T(p, we 
obtain a new matrix such that m =i 8n, fii = 0 for i > 1, and n, = su for 
every i, and further 

(7-3) Ti] = Sij * Sij/sii f or 4 ^ J ^ I* 

To prove property (b), we note that sij = s„, since <p is symmetric. Thus for 
i > l,j > 1, we note from 7.3) that 

T-i] — S{] SliSij/Sii = Sj^ Si] Sjx/Sii — 1*^1 , 

For the proof of the last property, we note that the corresponding minor of 
(7.1) in (p is of the form 

Sn Siu • • • Sikm 

(7.4) Siitj SfcjLil • • • 


L^Um J 

Since ip is symmetric, we have by (7.3), 

Tktk] ~ Sk^kj SlkiSiitj/Sii {i ^ Ij J ^ 1) ) 

0 = S)t,i — Si(sjSii/sn > !)• 

Thus by a theorem in the theory of determinants, the determinants of (7.1) and 
(7.4) are equal. 

L-bmma. 7.2) Let <p — (s„) {i, j = , r) be a symmetric matrix of positive 

type, and sn 9^ 0. Then there exists an r-row square matrix r whose determiruint 
is unity such that ^ = (r,j) = t^j has the properties stated in Lemma 7.1) and 
furthermore the minor of m in 7.1) is of positive type. 

To prove the positiveness of the minor of r,i, let the determinant of any one 
of its principal minors be 

riciiti ’ ■ • r/c2k„, 

Ml = (2 g ft, g . ■ • g g r) , 

' ^k}nkm 

where n,it, — rt/ki (i, j = 2, • • • , m) due to the symmetry. Now consider the 
bordered determinant 

I'll ruet • • • ri*,„ 

Afs = 0 I'klkt • ■ ■ ricikn 


I 0 Tkikm • ■ • rkmkm 

which by property (a) in Lemma 7.1) gives* iWj = ruAfi = SnMi. By property 
(c) in Lemma 7.1), is equal to the determinant of the form (7.4), which by 
hypothesis is positive. Thus suMi S 0. Since Su > 0, we conclude that 

Ml = Mi/sii 0. 
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Lemma 7.3). Lei p — (s„) {%, j = X, 2, r) he a symmetric matrix of 
■pro'perly positive type. Then there exists an r-row square matrix t whose deter- 
minant is unity such that = (r^) = r<p has the properties stated in Lemma 7 1) 
and furthermore the minor of ru in 4/ is properly positive. 

Since (p is properly positive, we find that sn > 0. The proof of this lemma is 
similar to that of Lemma 7.2). 

Suppose that the set of observations Ai, • • ■ , A, is linearly independent. 
Then by Theorem 14, the Gramian matrix (6.1) is symmetric and properly 
positive, and hence (Ai, Ai) > 0. By Lemma 7.3), the matrix (6.1) can be 
reduced to the form 


(7.5) 

[Ai Ai-0] 

0 

[AiAa-O] 

[AaAa-l] 

[AaAa- 1] • ■ 

[AlAr-0] 

■ ■ [AaA,. 1] 

where 

0 

[Aa Ar- 1] 

[Aa A,. 1] • ■ 

[A,A,-1] 


[Ai AfO] = (di, At) — [AjAi-0] (t = 1, ... , r) 

lA A 11 _ [AiAi- 0][A(A,-0] - [AiArO][AiA. 0] 

[AtA,.l\ [AiAi-0] • 

It is evident that [AiAi<0] = (Ai, Ai) > 0, since the matrix (6.1) is properly 
positive. By Lemma 7.3) the value of D (f) and the determinant of (7.5) are 
equal, and furthermore the minor of the element [AiAi- 0] is a symmetric matrix 
of properly positive type. Thus [AaAa-l] > 0, and [AiA,-l] = [A,Acl]. 

The minor of [AiAi-0] surely satisfies all the conditions in Lemma 7.3). We 
may, therefore, apply a transformation of the form (7.2) to the minor of [AiAi- 0], 
and secure another matrix of the same character as (7.5). In other words, we 
may multiply on the left of the matrix (7.5) by 

(7.6) T2 = 5 -|- JP 2 H 2 

where Da is the second row of the r row identity matrix 5, and 

F,{n) =0 (n g 2); F,{n) = - (n > 2) . 

In general, let 

(7.7) T, = 5 -h F ,D, (f = 1, • • • , r — 1; , 

where Di is the i*''' row of the r row identity matrix S, and 

(7.8) F.(n) =0 (n S f) ; FXn) = - | ~ --- j (n > t) . 
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Continuous application of this type of transformation ultimately reduces the 
matrix (6.1) to the form 




[AiAfO] [AiAa-O] [AiAs'O] 

■ ■ • [AiAr-0] 



0 

[AiAj'l] [AaAa-l] 

• • • [AaAr • 1] 

(7.9) 

•q = 

0 

0 [AtAs-^] 

••• [A3A,.2] 



, 0 

0 0 

■ • • [ArAr-r 

where 






\ i A . H . 

[AaA* 

■ h - 1][A,A.-A'l] - 

- [AhAfh ~ 

(7.90 

iJi - 


[AaAa • 

h - 1] 


ii, s = 1, 


0 g ^ sm(t, s)).^* 


In the matrix (7.9), we see by virtue of Lemma 7.3) that [AiA,,-i — 1] > 0 for 
every i, and [AiA,-h\ = [A, At h] for every s, t and 0 S h g sm(t, s). If 
h = sm(i, s), then [AtA,-h] ~ 0. 

Let T = Tr_i-Tr_j • • • T\. Then by the associative law of multiplication of 
matrices, we see that 

(7.10) ij = (t,_i • • . Ti){: = Tf. 


Thus we prove 

Theorem 15. If the set of vectors Ai, •• • ,Ar is linearly independent, then there 
exists a square matrix t of order such that rf is of the form (7.9) where all ele- 
ments below the principal diagonal are 0; every element in the principal diagonal 
[AfAfi — 1] (f = 1, • • • , r), is greater than zero; and [AiA,-h] = [A,Afh] for 
s, t = 1, • ■ • , r, and h < sm{t,s). Furthermore the determinants of the matrices 
(6.1) and (7.9) are equal. 

We now prove the following lemma which will be useful in the later section. 

Lemma 7.4). If \AiA,-i — 1] is different from zero for every i § 0, then for 
every pair of integers (s, <), where s,t = 1, ■ ■ • , r, and n g smit, s), we have 


a) [AtA..n] = (A„ A.) - ^ ^ • 

—I * — IJ 

b) U,(A, + A„)-7 i] = [AjA,.n] -h U,A„-n] , (m = 1, •••,r). 

c) [(cA()A,-n] = c [AtA,-n] , (c — a constant) . 

To prove a), take every pair (s, t). We find the lemma is true for n = 0. 
Assuming it is true for every (s, i) and fox n = ft < sm(s, t) , we find that A + 1 g 
sm.(8, t), and 


h+i 

u „ a .)-2 




[A.Aff - 1] 
[AiAi-i - 1 ] 


lA.A,-t - 1] 


sm (i, /) read "the smaller one of (<, j)." 
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(At, A.) - 


V - 1] 

— 1 ] 


[AiAg-i — 1] 




[Ait+iA,-h] 


= [A^A.-h] - [An^.-h] = + 1], 

for every s,i. 

Parts b) and c) are true for n = 0. Now make use of the equality in a) and 
prove by induction. 


Vin. Gauss’s Method of Substitution and its Relation to Gramian Schmidt’s 

Orthogonalization Process 

Let us write the set of observations in the form; 


' flu ttl2 


ain> 


a = 


\arl • * ' O-rn/ 

Let the orthogonalized set also be written in the form 

, = ‘I! V 

\brl ••• ij 


From Theorems 5 and 6, we find that there exists a transformation k given by an 
r-row square matrix such that 0 = Ka. Thus by the associative law of multi- 
plication of matrices, we have 

/3a’ = (<ca)a’ = K(aa') . 

Now the matrix aa' is the Gramian matrix (6.1). Thus 


(8.1) 


/3a' = kC . 


The composite matrix /3a' is of the form 


(8.2) 


(Bi, 4i)(B„ A*) . . • (Bi, Ar)' 
(Bz, Ai)(B 2, Aa) • • • (Bi, Ar) 


liB„ Ai)(B„ a.) . . . (B„ A.)J 

By Theorems 5 and 6, we note that (B„ At) = 0 for s > and (B., A.) = 
(B„ B.) for every s. Thus the preceding matrix can be written in the form 

'(Bi, Bi)(Bi, AjXBi, A,) . . . (Bi, A,)' 

0 (Bj, B^{Bt, Aa) • • • (Ba, A,! 


0 


0 


0 ...(B„B,)J 
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We have proved the following theorem: 

Thbobbm 16. Let Ai, ■■■, Ar be a set of vectors, and Bj, ■ • • , Br be the 
orthogonaUzed set; and let a — (a,;)i ^ Then there exists a square r-row 

matrix k such that )3 = sa, and Kata' is a matrix of the form (8.3) where all the 

elements below the principal diagonal are zeros and every element in the principal 
diagonal is positive. If the set • • • , At is linearly independent, then every 
element in the principal diagonal is greater than zero. 

Theorem 17. Let Ax, ■ ■ ■ , At he a set of vectors and Bx, • • ■ , B^ be the 

orthogorwlized set; and lei a = (a.,;), |3 = (bij). Then D{^oi') = D(<xa'). 

For by equations (2.1), we note that D{k) ~ 1. Thus 

D(pa') = B{Kaa’) — = D{aa') . 

Theorem 18. If the set of vectors, 4i, • • • , Ar is linearly independent, the 
matrix k arising from Gram-Schmidt’s orthogonalization process is identical with 
the matrix r defined by (7.10). 

To prove this theorem, we first establish the following 

Lemma 8.5): If the set Ax, • • • , Ar 6e linearly independent, and Bx, ■ ■ ■ , Br be 
the orthogonalized set, then for every i, h, we have 


{Bk, A|) = [AfcAj -h — 1 ] . 


By Theorem 10, the set B, is linearly independent, and hence n(B0 > 0 for 
every i. The lemma is evidently true for every t and = 1 . Assuming it is true 
for every t and h = s, wo shall prove it is also true for every t and A = s — 1 . 
Now 


5,41 = A 


•+i 


NT' 


Bi = A^+i — 


S [A,A,"i — 1] 
[A, A, A - 1] 


Bi. 


Thus by the linear property of ( , ) we secure, for every t 


(iS«+ij Aj) = A,4 i — ^ 


[AiA,-i — 1] 


^ [A.A.-f - 1] 

~ (A ,41 — 'V' ^AxA,-i — 1] 
(A.+i,A0 h{A,Ai-i~\\ 


Bi, At 

/ 

(Bi, Ai) 


= [A,4.iA<-s] 

by virtue of lemma 4.4). 

From this lemma, we conclude at once that the matrices (7.9) and (8,3) are 
equal. Thus by ( 8 . 1 ), we have 


Kf = = tS", or (k — T)f = w . 
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Since f is non-singular (by Theorem 12), we have 

W (k “ = (k - t)S = K - T, 

which proves the theorem. 

From Lemma 8.5), we have 

Lemma 8.6). Let L = (li, • • • , Q. Suppose the set ii, • • • , ir to be 
linearly independent, and 5i, ■ . . , 5, to be the orthogonalized set. Then for 
every h, 

lA,Lh -- 11 = (B,, L) . 

Theorems 16, 17, and 18 furnish us a new method for finding the most prob- 
able values of the unknowns in the theory of least squares. The formulation of 
the system of normal equations may be omitted in this new procedure, which 
may be described briefly as follows: After we obtain a set of observations 
Air" , K orthogonalize this set by means of Gram-Schmidt’s process. 
Let A be a non-zero vector. The product 

Ihii • ' • tiA /ttii • • • dri, - lA 


\hri • • • hrj \flifl * ' • Oifnj — ij 

will give us the result as desired by Gauss's method of substitution. 

Academia Sinica, 

Peiping, China, 



A NOTE ON THE ANALYSIS OF VARIANCE* 


By Solomon Ktjllback 


By cOLSideiing a set of mdependent iteovs classified in. some relevant manivei 
into N sets of s items each, and by the use of a dispersion theorem of Prof. J. L. 
Coolidge,^ Prof. H. L. Rietz^ arrives at estimates of variance, used by Dr. R, A. 
Fisher, without making use of arguments involving the number of degrees of 
freedom of the items concerned. 

By proceeding along the lines followed by Coolidge and Rietz but considering 
a set of independent items classified into N sets of Si(i == 1 , 2 , • • • , i'/) items 
each, we shall arrive at certain other important results of R. A. Fisher* in his 
analysis of variance. 

The theorem referred to above is as follows; If n independent quantities 
Vh Vt! iVti be given, their expected values being ai, 02, • • • , dn, while the 
expected values of their squares are Ai, Aj, • • • , An, respectively, and if we 

n n 

agree to set y = (1/n) y{,a = (l/n) ^ aj, then the expected value of the 

£ = l i<=l 

n 

variance, (l/n) ^ (yi - y)* is 

t 1 


( 1 ) 


n - 1 
n 


Su,- 


a!) + 2 

> = i 


Suppose a set of independent items has been classified in some relevant man- 
ner into N sets of Si (f = 1, 2, • • - ,N) items each as follows: 


* * * j ^^*1? 


(2) 


^21} ^22) ’ * * j ^ia^j ^2 


Z 

where = 1 , 2 , ■ ■ • , JV) is the arithmetic mean of the i**' set and x the mean 
of the pooled sample of s = Sv -p sj + ... 4. 5,, items. 

We shall assume that the set (2) is statistically homogeneous in the sense that, 

> Presented to the American Mathematical Society, February 23, 1935. 

* Bulletin Am. Math. Soc , Vol. 27 (1921) p. 439 
' Bulletin Am. Math. Soc., Vol. 38 (1932) pp. 731-735. 

‘ Proceedings of the International Math. Congress, Toronto, 1924, Vol. 2, p. 802 ff. 
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using E ( ) for the expected value of the expression in the parenthesis, we may 

let = a, = A, (i == 1, 2, N, J 1, 2, ■■■ , 5.). 

Then, using (1) 

(3) 


M 2) - 1) - aO 


Summing (3) from i = 1 to JV, we have 

( AT, a t \ ^ 

^ (,x^^ — x^y I == (4 — a^) 2} (s. — 1) = (s — N)(A — a^) . 

t-l ,3 = l / l-l 

Similarly, by using (1) 

(5) E s/aE. — x)^ = ^ s, [£?(x®) — o*] . 

But® 

(6) if(Xi) — = J5(x, ~ a) 2 , and 

(7) -E(X. — a)* = (j 4 — a“)/s„ therefore 

( 8 ) • 


Similarly by using (1) 
(9) E 


\ S / = (s — 1) (^ - a^) . 

Thus, in a statistically homogeneous set of items, classified as in (2), the fol- 
lowing estimates of Variance have the same expected value ; 


( 10 ) 


V = 

s 

where 

<S = 

N,»v 

s - 1’ 



t 


Vi = 

Si 

where 

& == 

l 

N,»t 

2) (*•/ — *.)* 

, j—l 

s ^ N' 



where 

& == 

2 s,(X, — X)* . 

t — 1 

V j — - 

N ~ 1’ 


These estimates are used in applying the analysis of variance to the study of 
the correlation ratio, 17, for uncorrelated material, where 77® = ^s,/ S. 

Ofpic® OS' THU Chief SiqnaI/ Oeficbb, 

Washington, D. C. 


* Rietz, H. L., loc. cit. p. 733. 



A PROBLEM INVOLVING THE LEXIS THEORY OF DISPERSION 

By Walter A. Hendricks 

The attention of the author was recently directed to a study of the hatch- 
ability of chicken eggs at the XJ, S. Animal Husbandry Experiment Station, 
Beltsville, Maryland. It was necessary to find the average hatchability of the 
fertile eggs incubated for each of a number of lots of birds and the corresponding 
standard errors of those averages. 

It was very apparent that some methods for computing such values, in com- 
mon use at the present time, do not give satisfactory results. This is due to the 
fact that the fertile eggs produced by different birds vary considerably with 
respect to hatchability as well as with respect to number of eggs available for 
incubation. It seems reasonable to suppose that the variability in hatch- 
ability of a number of fertile eggs, produced by a given number of birds, should 
obey the Lexis law of dispersion. This supposition is based on two hypotheses: 

(a) The probability that a fertile egg will hatch is constant for all fertile eggs 
produced by the same bird during the time interval under consideration. 

(b) The probability that a fertile egg will hatch varies from bird to bird. 

The reader familiar with the principles of genetics may question the validity 

of the first of these hypotheses. The probability that a fertile egg will hatch is 
largely governed by the genes carried by the chromosomes of the ovum of the 
hen and the sperm of the male bird which fertilized that ovum. The kinds of 
genes carried by various ova and spermatozoa are not necessarily the same, even 
when those ova and spermatozoa are produced by the same female and male 
birds, respectively. However, if we have a sample of a number of fertile eggs 
produced by the same hen, we are justified in assuming that the proportion of 
those eggs which will hatch is constant, except for sampling fluctuations, when 
successive samples of fertile eggs produced by the given hen are incubated, pro- 
vided, of course, that the eggs in the successive samples were fertilized by the 
same male bird or birds. The limit approached by the proportion of fertile eggs 
which hatch as the number of fertile eggs produced by the given hen becomes 
infinitely large may be defined as the probability that a fertile egg produced by 
that hen will hatch. It will be recognized that this definition is based on purely 
academic considerations, since there are physical limitations to the number of 
fertile eggs which a hen can produce m a given period of lime,. Hypotheses (a) 
and (h) are to be interpreted in the light of this definition of the probability that 
a fertile egg produced by a given bird will hatch. 

Let «i, 82, • • • Sn represent the numbers of fertile eggs produced by n birds 
during a period of time and lct/1,/2, •••/„, respectively, represent the numbers 
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of chicks obtained from those eggs when the eggs are incubated. Let pk — — 

represent the hatchability of the fertile eggs produced by the bird. 

The squared standard error of p*, is given by the Lexis formula 



PQ _j_ fife — 1 

Sk nsk 




( 1 ) 


in which the Pt represent the respective probabilities that the fertile eggs pro- 
duced by the n birds will hatch, P is the arithmetic mean of the Pt, and Q is 
equal to 1 — P. 

The values of the probabilities, Pt, are not known. However, as a first 
approximation to equation (1) we may write: 



Vg , Sk - I 
Sk nSk ^ 


ipt - vY 


( 2 ) 


in which p is the arithmetic mean of the p< and q is equal to 1 — p. 

The product, pq, can be accepted as a reasonably close approximation to the 

• n 

product, PQi but the expression, XI (Pt P)^i in general, be greater than 

n 

the expression, ^ (Pi — PY- The reason for this is apparent when we con- 

sider that if each of these two expressions is divided by n, the former yields an 
estimate of the squared standard deviation of the pt while the latter yields an 
estimate of the squared standard deviation of the Pt. The standard deviation of 
the Pi will, in general, be greater than that of the P, because the p, are more or 
less imperfect estimates of the Pt and are, therefore, subject to sampling errors 
from which the P, are free. 

We may write : 

- 2 - -p)’ = i S 

in which crl is an appropriate correction as yet undefined. 

Since the pt would approach the Pt as statistical limits if each of the s, were 
made extremely large, it follows that must approach zero as each of the s, 
approaches infinity. Furthermore, if Pi = Pz = . • • Pn = P,v^e must have : 


1 

n 


2 ~ pY - arl = 0 

I 






or 


(4) 


‘ The formula as given in this paper is a modification of that given by Rietz, H, L. (1927) 
in his book, Mathematical Statistics, Open Court Publishing Co., Chicago, which was 
necessary in order to make it applicable to relative frequencies. 
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These conditions suggest that it\ be defined by the equation; 




2 

c 


1 


( 5 ) 


If (r\ is so defined, it will obviously approach zero as each of the Sj approaches 
infinity. Furthermore, it has been shown by Yule^ that if we have a series of n 
relative frequencies, such as the pi under discussion, based on n samples of 
unequal size, and the probabilities of the occurrence and non-occurrence, 
respectively, of the particular event under consideration are constant from 
sample to sample, the squared standard deviation of those relative frequencies 
is given by a relation such as that used to define o-® in equation (5). There- 
fore, the second condition is also satisfied, a-l may be interpreted as repre- 
senting that part of the squared standard deviation of the pt which is due to 
the unreliability of the pt as estimates of the Pt. 

Therefore, it seems reasonable to write: 


<»1 i ***! <**1 


( 6 ) 


Combining equations (1) and (6), we obtain the following formula for calcu- 
lating the squared standard error of p*,: 


_2 _ P? I s* — 1 

Vpi — 

* 3* ns,. 


S 


ipt ~ py 


pq 


Si 


1^‘J 


(7) 


Since the weight of a measurement is inversely proportional to the square of 
the standard error of the measurement, we are now in a position to calculate a 
weighted mean, p, of the p,. 


in which: 


p = 


Z) «’<P< 

t-i 


( 8 ) 



(9) 


The squared standard error of p is given by the familiar formula: 


L ^tipt - p)* 

I = i- _ . (10) 

(n — 1) 2 wt 

i-l 


’ Yule, G. XJdny, 1927. Introduction to the Theory of Statistics, Charles Griffin and 
Co., London. 
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It would seem that p may be accepted as a good estimate of the average 
hatchability of the fertile eggs produced by the given lot of birds, and that 
equation (10) may be used to obtain a valid estimate of the reliability of p 

However, the problem is not quite so simple. In the first place, there is 
usually a small amount of positive correlation between the number of fertile 
eggs produced by a bird and the hatchability of those eggs. Secondly as 
pointed out earlier in this paper, the hatchability of fertile eggs is influenced to 
some extent by the male birds used to fertilize the eggs The error involved in 
neglecting the correlation between hatchability and number of fertile eggs incu- 
bated does not seem to be of much importance in those practical problems which 
have come to the author’s attention. The effects of differences among the male 
birds may be largely obviated in experimental work by frequently transferring 
male birds from Jot to lot during the experimental period. 

The best test of the suitability of a particular formula for calculating the 
standard error of an average is to compare the value of the standard error 
calculated by means of the formula with the corresponding value obtained by 
direct calculation from the distribution of a number of such averages obtained 
under essentially the same conditions. The accompanying table gj^gg ^j^g 
standard error of the weighted average hatchability of fertile eggs calculated 
for each of four lots of birds by means of equation (10), together with the corre- 
sponding values obtained from the distribution of averages. The former are 
designated as the “predicted” values and the latter are designated as the 
“observed” values. In the calculation of the “observed” values, the various 
averages were assigned the same weights which were used in the calculation 
of the “predicted” values. 


Comparison of ‘ 'predicted” and "observed” standard errors of the weighted average 
hatchability of fertile eggs, calculated for each of four lots of birds 


^ 

Lot 

V 

Standard error of p 

“Predicted” 

Observed” 

1 

0 7684 

0 . 0287 

0.0327 

2 

0.7115 

0.0533 

0.0561 

3 

0.6834 

0 . 0355 

0.0379 

4 

0.7260 

0.0615 

0.0674 


The data used in these calculations involved a total of 74 birds, appj.Qjjj_ 
mately equally divided among the four lots, and a total of 2,901 fertile eggs 
which were produced and incubated during an experimental period of 48 weeks 
The agreement between the “predicted” and “observed” standard errors of the 
weighted average hatchability for each lot of birds is excellent. However, the 
author’s experience with biological data tends to make him doubt that such 
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close agreement will always be found when such data are subjected to the 
above treatment. The agreement in the present illustration could be less 
close without indicating that the method of calculating the “predicted” stand- 
ard errors is unsound. 

Bfbbau of Animal Industby, 

U. S, Dbpaktmbnt of Agriculture, 

Washington, D. C. 



A METHOD FOR DETERMINING THE COEFFICIENTS OF A 
CHARACTERISTIC EQUATION 

Paul Hobst 


For the characteristic equation 


Oil — a: • 

* 

flrtl 

• ann ^ 


(-l)”(a;" - _ . . . ^ c„) 


s (a; - a:i)(a; — aa) • • • (a: - «„) 


it is well known that 


Ct = A, 


( 1 ) 


where A » is the sum of all z'''’ order co-axial minors of the determinant 

A = 


Oil 

* ' * 

^nl 

* * Ann 


( 2 ) 


If n exceeds 3 or i, the process of calculating all possible principal minors is 
very cumbersome. 

But another more systematic method of calculating the c’s may be adopted. 
Suppose we define 


A” = 


1 


■,{p) 


... 

“nl “nn 


and 


It may be proved^ that 






(3) 


(4) 


(5) 


But from Newton’s identities® we have 


Sp -|- CiSp-i CiSp-i ■ "h Cp-iSi -h pCp = 0 . (6) 

‘Muir, L (fe Metzler, W H., “A Treatise on the Theory of Determinants,” p. 606, 
1 6S0 and 651. 

’Dickson, L E., ‘’First Course in the Theory of Equations,” p. 134, f 106. 
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Newton's identities are ordinarily employed for calculating the sums of the 
powers of the roots of a polynomial when the coefficients are known. They may 
be employed equally well, however, for calculating the coefficients when the 
sums of the powers are given. Thus by means of equations (5) and (6) the 
coefficients of (1) may be readily calculated. 

If in (2) a,j = a,j, the calculation of the successive values is straight- 
forward. The determinant A is used as a constant multiplier so that 

A-A = A% A-A’^ = A^, A-A’'-^ = A" 

and the multiplication is column by column. That is, 



THE GENERALI2ED PROBLEM OF CORRECT MATCHINGS 

By Dwight W. Chapman 

A method common to many experimental and testing procedures in psychology 
and education is to require an individual to match, as best he can, members of 
one series of items with members of a second series of quite dj.fferent items certain 
of which are in some sense true apposites of items in the first scries. Thus the 
experimental psychology of personality has often investigated the ability of 
graphologists or laymen to pair samples of handwriting produced by a group of 
persons with, say, character-sketches of these same persons; and the excess of 
correct matchings thus produced over the number to be expected by chance has 
been used as evidence that the expressive movement of handwriting affords 
characteristics diagnostic of personal traits. Fortunately, the excesses experi- 
mentally obtained have often been so large as obviously to exclude the operation 
of chance alone. But many empirical results show small excesses only; and the 
interpretation of such findings has not hitherto been subjected to rigid statistical 
analysis. 

The particular statistical problem resident in this experimental procedure is 
twofold, involving the estimation of the significance of (a) a given number of cor- 
rect matchings produced by one individual, and (b) a given mean number of cor- 
rect matchings produced by a group of individuals working with the same mate- 
rial independently. 

Furthermore, two cases arise in practice: (1) the two series of items are of 
equal length, and each item in either series has a true apposite in the other series ; 
or (2) the two series may be of unequal length, in which case the longer series 
contains not only a true apposite for each item of the shorter series, but, in 
addition, a certain number of extra, irrelevant items which cannot be correctly 
matched with any items in the shorter series. I have already given the solution 
to problems (a) and (b) for case (1).^ But case (1) forms only a corollary of the 
more general case (2), to the solution of which this present paper is devoted. 

(a) The Significance of a Given Number of Correct Matchings Resulting 

from a Single Trial 

Let there be given a series of M x-items. 


and a series of t y-items. 


• • • 3/y 


2 / 1 , 2/s, • • • 2/t • 


'The Statistics of the Method of Correct Matchings, Amer. Jour. Psychol., 46, 1934, 
287-298. 
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Let t ^ u, and let the first t a:-items be in some sense true apposites of the corro- 
spondmgly numbered y-items, so that if ?// be paired with Xj (j = 1, 2, . . i)^ 
this pairing will constitute a correct matching. 

The first problem which arises is that of determining the probability that a 
single random arrangement of the t ^-items against t of the a:-items will result in 
exactly s ( = 0, 1, 2, ■ ■ • f) correct matchings. 

We begin by putting the first s y-items in correspondence with their apposite 
x-items. Then the number of arrangements of the t y-items in which only these 
s are correctly matched is the number of arrangements of the remaining t ~ s y- 
items against the remaining u ~ s a:-items such that no correct matchings occur, 
With respect to these items, let 

n = the number of all possible arrangements, 

n(Y,) = the number of arrangements such that at least the j*'' item is cor- 
rectly matched with its apposite, 

n(YjYk) — the number of ariangements such that at least both the j*'' and 
items arc matched with their apposites, etc.; 

and let 

n(f’}) = the number of arrangements such that at least the item is not 
matched with its apposite, 

n(2itk) = the number of arrangements such that at least the j*’’ and ifc®' items 
are not matched with their apposites, etc. 

We have then to evaluate the expression n(F,+iF ,+2 • • • Yt), the number of ar- 
rangements of the items remaining, after setting s of them correctly matched, 
such that no further correct matchings occur. 

Now it can be shown that® 

n(F.+iF.+2 . . . F,) = n 

— [w(F,+.i) -|- w(F,+2) -!-•••+ 

+ [n(F,+iF,4.3) n(Fj+iF,^.3) w(F2_iFi)] 

- [a(F.+iF.+2F,+3) + •■■ + n(F,_2F,-iF,)] 

+ . ■ . 

-h(-l)‘r^(F.^.^F,^.2... Yt). 

The value of the expressions on the right side of this equation can be deter- 
mined as follows: 


® H, Whitney, A Logical Expansion in Mathematics, Bull. Amer. Math. Soc., 1932, 672- 
579. 
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The value of n is the number of ways in which t — s items can be 
against 


u — s items, which is 


(u — s)! 

[{u - s) - {t - s)]f 


(r — s) ! 
(u - 0 ! ' 


arranged 


The value of the first bracket — ^the number of arrangements of these items 
such that some one of them is correctly matched — is derived by holding one of 
the items matched, which can be chosen in < — s ways. This leaves t — s — 1 
2/-items, which can be arranged against the remaining w — s — 1 a;-items in 
(w — s “ 1) \J{u — f) ! ways. The product of these two expressions gives us for 
the value of the first bracket 


[«(y.+i) + . . . 4- niY,)} = 


(< — s)!(m — s — 1)' 
{u — t)\ 


To evaluate the second bracket, we hold two of the t — s items matched, which 
can be chosen in {t — a)!/[2!(i — s — 2)!] ways. There remains t — s — 2 
y-items which can be arranged against the remaining u — s ~ 2 a:-items in 
(m—s— 2) !/(«—<)! ways. The product of these two expressions gives us 




Continuing thus, we develop the following series for the number of arrangements 
of t items against u items such that the first s are correctly matched: 

V v\ (u - s)\ it ~ s)\(u - s - 1)] , (< - s)!(w - s - 2)1 

. . . rd == 

_ 4- ~ 

^ (i - «)!(w- <)!’ 


In order to express the number of arrangements, such that any s correct 
matchings occur, we must multiply the above series by <!/[s!(< — s)!], which is 
the number of ways in which s items can be chosen from t items: 


iV(., 


<! 


s\(t — s)] 


(w — s) ! 

.(m - iV- 


(t — s)Ku — s — 1 )! 

(u — t)] 

+ • • * + (— ih"* 


(< — s)!(u — t)\ 

(t - s)!(u - t)\_ ■ 


And in order to obtain the probability that a single random arrangement will 
result in exactly s correct matchings, we must further divide by u\/(u — t!), 
which is the total number of ways in which t items can be arranged against 
u items. Calling this probability P(,), we have then 


P — (r — Ql r (m — s)! 

~ wlsl {t — s)1l(w — 0! 


(< — a) ! (m — s — 1) ! 

(u - t)\ 

+ * • * "h 1)'"* 


(f — s)! (m — f)! l 

(i — s) I (« — t ) ! J ‘ 
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pl^s^o® ~ ®) V(u ~ I out of all terms in the bracket, the 


senes svm- 


P(.) 


" + ku-s~ 2 )! 

L 4 ~ s)i ^ ^ _ 2), 


- . . 4 . (_ i)<~. 

+ t- 1 ) __ sy. 




( 1 ) 


ovlrtlv s comir^ s^Dation, the significant question is not the probability that 
xna Jngs. probability of s or more correct 

P(.o,mo«, « 4 + . . . 4- P(0 . 

whence, by equation (1)^ 

Pooimowi = 4 ~i i (w — s — 1)1 iw — 8 — 2 )'. 

«!“iLoi« - .M - T^nrimyi + 2<it^rr^)\ 


+• 


t\ 


(s -b l)',u! 


LO'W-8-1)! r!^V_2)! 




1(1 


(u — t)i 


+ (s + 2 )iui[ 5 i(n: 7 ~_ — 4 ( 


+ 


+ 


4 • • • 4 l- 1 P~' ' _ “g - 1)101 

(t_s-2)!0!j 




^ r(w - {)i 


tlut L OlO! 




( 2 ) 


of factorials anr;^ej^Q better suited to practical computation from tables 

1 -1 

• Ut - s)l l_0!s'J 


P (» or raor«) =s J 

wl 


- I'll 

1 

{t ~ S ~~ 1)\ 

l0'(s + 1)1 




2) ! 1.01(8 4 2) S r(s + 1) ! "^ 21s! J 

* In tile SpeGi 8 ;,\ ca 

lengtli, wiieagg t =, u ^^11 the aeries of a-items and the series of j/-items are of the same 

‘'’i*.eqnatiQu(i)rednce 8 to 


n 
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(z^!rj_ 

^ 0! LO!^! 


+ 


1 


l!(i - 1)! ^ 2\{t ~ 2)! 

— + (- ly 


{t - s)!s!jj ■ 


(20 


(b) The Significance of a Given Mean Number of Correct Matchings Result- 
ing from n Independent Trials 

A frequent practical situation is that in which interest centers on the signifi- 
cance of the mean number of correct matchings achieved by a group of n indi- 
viduals working independently with the same two series. 

In order to determine the probability that the mean number of correct match- 
ings, s, resulting from n independent trials shall equal or exceed a given value, we 
are required to describe the distribution of the means of samples of size n drawn 
at random from a parent population in which the variable is s(= 0, 1, 2, . . • t) 
with relative frequencies P(o), Pa), Pm, • • • P(t), given by equation (1). The 
tabulation of this parent distribution follows: 

Table I: Distribution of s 


s Relative frequency (= P(,)) 



t\ 

u\ 

(u 

- D! 

, (u 

- 2)1 

(u 

-3)! 



u 

0!w! 

Lok! 

iKt 

-1)! 


-2)! 

3!(J 

-3)! 












(- 1)‘^ 

- t)f 

«!0! J 

1 

<! 

1!m! 

Lok< 

- 1)1 
- 1)1 

(m - 
11(< 

- 2)1 
- 2)! 

^ 2!« - 

-3)1 
- 3)1 

- ■ ■ • -b (- 

^ it- 

-on 

•1)I0L 

2 

t\ 

2!m1 

1 (u 

Lo!(« 

- 2)! 
-2)! 

(m 

1I(< 

-3)! 
- 3)1 

+ ... 

-b (- 

1 1 1-2 ('^ 

^ a ~ 

-on 

2)!0!j 


t 

i! f 

'{u - 

- on 








( 

ilwll 

0!0! J 









We now determine the first four moments, Pi, vt, va, and n, of this distribution 
about the origin s = 0. Since, in general. 


Vk = [s* X (Relative frequency of s)’j = s* P(o , 

s=0 ■ s=0 

the tabulation for the computation of any moment is as follows: 
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Table II: The Computation of the Moment of the Distribution of s 


;![(■« -1)1 ^m-2)! (m-3)! f (u-f)!] 

t!L0!(i-l)l 11(< - 2)r 2!(< - 3)1 ^ (i-l)!0!j 

!l r (w - 2)! {u - 3)! , .y _2 (m - <)!"] 

^!L0!(i-2)! ll(f - 3)! ^ ' («-2)!0!j 

t! r - 3)! _ . , ( ly-. fa-2)! ~| 

^!Lo!(<- 3)! ^ (<-3)l0!j 


tH\ r(w - on 
tlwlL 0!0! J' 


ji-i, 2* 2*^“! t'‘ t’‘~'- 

Noting thatj, = -oT’Il = TT- n = (nrift 

in brackets by these factors, we develop Table III : 

Table III 


y and multiplying the terms 


1“ diagonal 2“'^ diagonal 3'^ diagonal 

0 i i i 

<I r - 1)1 I'^Kw - 2)! P-Km - 3)1 

w!L0I0!(i - 1)! 0!l!(t - 2)!'^ 0I2!(< - 3)1 


(*'* diagonal 

i 




fl r2*-KM - 2)1 2 »^Hw-3)! 

tlLllO! 


w!Ll!0!(< - 2)1 l!l!(<-3)! 




titL2l(ll(( - 3)1 2K! - 3)IOlJ ‘ 


II r - 1)11 „ , ^ 

aL (l-I)IOIOl J<^*°"°^ 
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Since each, series in brackets is one term shorter than the preceding series, the 
table forms a system of t diagonals. The sum which gives us may therefore 
be considered as the sum of these diagonals. 

Now, from inspection, it is evident that the general diagonal is of the form 


s**" diagonal = 


ul(t - s)!L(s - 1)!0! 


(s - l)*-i 
(s - 2)!1! 


+ ••• + (-l)-i 


p-i 

0!(s - 1)!. 


tl(u — s) ! 
ul(t — s)! 





But it can be shown'* that 


Whence 


S (-')'(» 7 O-o 


when k < s . 


diagonal = 0 when k < s . 


Therefore Vk is given simply by the sum of the first k diagonals of Table III. 
Or, in general, 

t\(.u - i)!ri'=-i'] 

~ s!(< - l)!Lo!0!j 

l!(« - 2)!r2*-^ I*-!] 

- 2)!Li! 0I OlllJ 

<!(w - 3)! ['3*'-* 2*-i 

Ml(t - 3)!L2!0! 1I11'^0!2 iJ 

tl{u-k)\[ (jfc _ l)*-i 

m!(< - fc) 1 L(fc - 1) !0! (k - 2) 11 1 

+ OK^]- ® 

To this equation we must, of course, add the condition k ^ L 


* E. Netto, Lehrbuch der Combinatorik, Leipzig, 1901, 249, Formula 17. 
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Solving now for the first four moments, we have 




Vi 


Vi 


■ + ^^l]' 

-‘I 

ML 


t 

Vi. — - 

u 


1+3 

1+7 


J- , a - l)(i 

-I I 


2 ) 


+ 


(u — 1) (m — 2) 

. it 




It — 1 ' ” (w — 1) (it 

If now we define, for convenience, 

t 

a = - , 

It 


m - 2 _)„ it - m ~ 2){t - 3 ) ~ 

2) (it — l)(it — 2) (it ~ 3)_ ‘ 


^ ( 4 ) 


h = 


c = 


d = 


t - 1 
It — 1 ’ 

t - 2 
It -2’ 

t ~ 3 


(5) 


It - 3 ’ 

we have, for the constants of the distribution of s, 

Mean = vi — a. 

ili — Vi — v? 

= o(l + fc) — a^, whence ^ = ^0(1 + ^) — o® • 

fii — vi ~ ZviVi + 2vl 

= o(l + 36 + he) - 3a“(l + 6) + 2a’ 
fii = Vi — iviVi + 6j'iI'2 — Zv{ 

= a(l + 76 + 66c + bed) - 4a*(l + 36 + he) + 6o’(l + 6) - 3o" J 

From these constants we can determine the skewness and kurtosis of the distri- 
bution of s, 

2 

(6) 


j3i = ^ , and ft = ^ 


3 > 


Ms 


Now it is known that the means of samples of size n drawn from a parent 
population with constants /Si and ^i are distributed in such a way that 



GENERALIZED PROBLEM OP CORRECT MATCHINGS 


93 


ftCmcans) — ~ ) Rn(l ^2(moiins) — 3 . (7) 

71 71 

Therefore, having determined the beta-constants for the distribution of s, we 
can determine the beta-constants of the distribution of S, the mean number of 
correct matchings resulting from n independent trials. 

Now when i = w g 4, we have 

a = h = c = d= l, 

and equations (6) give us for the distribution of s 
Mean = 1 , 

MS = 1 , = 1 

^ whence \ 

Us = i ) [^2 = 4 

M4 = 4 

and therefore, for the constants of the distribution of s, we have, by equations 

(7), 

= i , and ft = 3 -h i , 

which indicates a positively skewed and leptokurtic distribution. The effect of 
increasing u and holding t constant is to increase the skewness, as shown in the 
following table for < = 5 : 


t 

u 

ft 

5 

5 

1 

n 

5 

6 

1.05 

n 

5 

7 

1.16 

n 

5 

8 

1.31 

n 

5 

9 

1.46 

n 


The degrees of skewness and kurtosis met with in practical cases of matching 
with any considerable number of judges (n) are such that a Pearson Type III 
distribution curve gives a reasonably good fit to the distribution of mean num- 
bers of correct matchings. If, therefore, we have to determine the significance 
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of any obtained mean number of correct matchings, we may resort to Salvosa’s 
tables^ of the area under the Type III curve. 

As a concrete example of the application of this method let us imagine that 10 
judges have arranged 5 character sketches against 8 specimens of handwriting, 
5 of which are true apposites of the sketches. Let the total number of correct 
matchings achieved by this group be 12, whence the mean number per judge is 
1.2. We have, then, 

S = 1.2, n = 10 , 
i s= 5, u = 8, whence o 

b 


c 

We now find the mean, standard deviation, and /Ji of the distribution of §, as 
follows: 

The mean of the distribution of s is, by sampling theory, the same as the mean 
of the distribution of s: 


- = .625 , 
u 


t 


u — 1 

t - 2 
u — 2 


= .671 , 
= .500 . 


Mean = a == .625 . 


The second moment of the distribution of s is, by sampling theory, - times 


the second moment of the distribution of s; whence, by equation (5), 


Standard deviation = a/ [a(l -)- b) — a*] = .243 . 


And, by equations (5) and (7), 

1 [a(l + 3b + be) - 3aKl +b) + 2aH^ 

10 [a(l + b) - a*]3 

Now the obtained mean number of correct matchings was 1.2, and the next 
lower number which could have occurred (corresponding to a total of 11 instead 
of 12 for the group of judges) is 1.1. The lower boundary of the class-interval 
whose midpoint is s = 1.2 is therefore 1.15; audit is the area above this boundary 
under the curve of s in which we are interested. 

‘ L. R. Salvosa, Tables of Pearson’s Type III function, Ann. Math. Statist , 1, 1930, 
191-198. 
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The deviation of this boundary from the mean of s is 


1.15 - .625 = .525 , 


and this deviation expressed in terms of the standard deviation gives 


.525 

.243 


= 2.16. 


Entering Salvosa’s table for the deviation 2.16 and skewness = = -36, we 

find by interpolation that so good a performance should be expected by chance 
only about 23 times in 1000. 



MOMENTS ABOUT THE ARITHMETIC MEAN OF A BINOMIAL 
FREQUENCY DISTRIBUTION 

W. J. Kiekham, Oregon State College 

Although the most useful moments of a binomial distribution have been 
derived as a function of the parameters of the generating binomial for any 
binomial frequency series, a generalization of notation and procedure is well 
worth our consideration. The problem attempted in this paper is the calcula- 
tion of the moments about the mean for the general frequency series of Table I. 


TABLE I 

The generalized binomial frequency series 


X (item) 

f (frequency) 

0 

N-nCopY 

1 

N-„C:pY-^ 

2 

N-nCipY~^ 

n 

N-nCnPY 


In calculating the moments of a set of data about any value, it is often found 
convenient to use an arbitrary origin, define the moments about this value, and 
represent the desired moments in terms of those calculated. In the general 
binomial series, the origin of the a;’s is found to be the best arbitrary origin. 
These intermediate moments are 


Vl 



= M, arithmetic mean; 



( 1 ) 


Vn 


" N 


where v, is the moment. 

The moments (m’s) about the mean are easily defined as functions of the v’b 
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from fundamental definitions of the jit’s. Denoting the moment by m, we 
have 


Ml 


- . _ u, 


M2 = 

M3 = 


'Zfjx - MY 
N 

N 


"i ) 


— v$ — “f* ^ 


( 2 ) 


In general, 

iln = Vn — v,GiVn-lVt + nCiVn-iv\ ( — l)""*(nC„_l — l)vj . (3) 

Or, if we let {v}” = v„, we may express the moment by a simple notation. 

Mn = {m)" = {>')” - + „C'2{j^)’‘-Vi 4- • • ■ = ({r} - Vi)". (4) 

tSolving the equation for {v}, 

{r) = {/j} + 1'1- 

Raising both sides to the n*'' power and substituting for the “brace” notation, 
I'n = M» "t" nClMn— ll'l “|“ nC’ 2 MB- 2 *'l ■+■ • • • + rj . 

Whence 


Mb — I'n — nGifln—l^l — nOifin—tVi — • • • — Pi , 


(5) 


a semi-recursion formula. 

The original formula for ja„ contained n moments or variables ; and since there 
are only {n — 2) of the ju’s which are of lower order than ju„, it is necessary to 
retain and j/i in (5). Since jui = 0, one term in the expansion of ju„ is zero. 
For instance, when n = 5, we have 

fis = P6 — 5m4Vi — — lO/iat-i — vl . 

To calculate nk, it is necessary to calculate the v’s from vi to Pk- For the 
binomial series, these v’& are 


n = IwpQ"-! 4- — -f 


3(»)(«. — l)(n — 2) 


1-2 


= np 


gn-l ^ ^ p’2"-* 4- ^ 4- ... 4 pn-l 


1-2-3 
2! 


p3gn-3 _|_ 


>'2 = np l-q”~ 


+ p)"~^ = np 
2(n — 1) 


1! 


pi5«-2 4 4 


4-np’ 


-]■ 
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V3 - np [p . ff-i + 2* pY~^ + 3^ —— 2 ]^-- P'r-^ + - . . + , 

vu = np + 2*-i pig»-2 

+ 3 *=-i (w - l)(n - 2) _j_ _ , 

In the simplified form of vk, the [ ] is the (k — 1)*’' moment about — 1 of the 
binomial series generated by the binomial (q + Denoting this [ ] by 

~ 1)) the v's can be expressed by the formula 

Vk = npv'k-i (n. — 1) , (6) 

where v' is a function of (n — 1) and (fc — 1) while Vk was a function of n and k. 

Let us see how a v' in Vk can be defined in terms of the v’s of lower order 
than k. In finding this relationship, a consideration of the two series of Table II 
will be helpful. 

TABLE II 










BXNOMIAI. FREQUENCY DISTRIBUTION 


99 


In general, 

y'j. = Vl — hClVk-liy^ — )'i) + kC2VL-iivi ~ »'i)* + ■ • " + (7) 

The formula just derived may be used to define the moments about any 
origin in terms of those about the original zero of the x’s. For our immediate 
use, the formula simplifies since -|- 1. Then 

~ hCxVk—l -j- kCiVk—i -f- kCaVk—B -!-••• (8) 

By simple analysis we found the value of vi to be np. By the method of 
continuation, we are able to extend the list of r’s to any number, v' from (8) 
is used in (6) with n replaced by (n — 1) in the v's. 

Vo — 1- 

vi = np. 

V 3 = npv[{n — 1) =; npWin — 1) + vo(n — 1)] 

= np[{n — l)p + 1] = n{n — VjpP’ -f np. 

V 3 = npv[{n — 1 ) = np[v 2 {n — 1) + 2j'i(n — 1 ) + vo(n — 1 )] 

= n(n — l)(n — 2)p® + 3n(n — l)p* + np. 

Vi = npv3(n — 1) 2= np[v3(n — 1) 3j'2(n — 1) + — 1) 4- ''o(n — 1)] 

= np{[{n — l)(n — 2)(n — 3)p® + 3(re — l)(n — 2)p^ + (n — l)p] 

+ 3 [(n - l)(n - 2)p2 + {n — l)p] + 3 [(n - \)p] + 1} . 

= n{n — l)(n — 2)(7i — 3)p^ + 6(»)(n — l)(n — 2)p® -(- 7{n)(n — l)p* -f ^P • 


If the order of the terms in the expansion is reversed, v„ is an ascending power 
series in p. The pure numerical coefficients in some of these y’s are 

Vi = ( 1 ) 

>'2 = ( 1 , 1 ) 

r, = (1, 3, 1) 

V4 = (1, 7, 6, 1) 

Vi = (1, 15, 26, 10, 1) 

Vi = (1, 31, 90, 65, 15, 1) 

V, = (1, 63, 301, 350, 140, 21, 1) 

Vi = (1, 127, 966, 1701, 1050, 266, 28, 1). 
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In general, 

/ n n / 1-1 \ 

r„+l = ( 1, £ nC., i: Lc. Z , 




( 9 ) 


Using the foregoing i'’s, and the semi-recursion formula, we are able to deter- 
mine the m’s. 


H2 = V2 — Vi 

= [nv + (n){n — — {npY 

= nvil - V) 


= npg. 

Ms = >'3 — 3 i'iM 3 ~ 

= [np + 3n(n — 1)^^ -f (n)(n — l)('n — 2)p®] — 3(n?>)[7ip(l — p)] — [np]®. 
= up + (— 3n)p^ (2n)p® ?= np(l — 3p 4- 2p^) 

= np(l - p)(l - 2p) 

== npq{q - p). 

Ml = [np + 7(n)(n — l)p* + 6(n)(n. — l)(n — 2)p* -f (n)(n — l)(n ~ 2) 

(n — 3)p‘] — 4(np)(np)(l — 3p -f 2p^) — 6(np)*(»p)(l — p) — (np}< 

= np -|- 7n -h 3?i^)p^ -j- (12n. — 6n.“)p® (— 6n -f 3n^)p^ 

= np(l — 7p 4- 12p® — 6p^) 4- 3n*p®(l — 2p 4- p^) 

= wpg — 6np*§^ 4" 3n“pV. 

MS = wp(l — 16p 4- 50p^ — 60p’ 4- 24p*) 4" 10n^p'*(l — 4p -f ~ 2p®) 

= (g — p){npq — 12np^g'* 4- lOii^p^g*). 

MS = np(l — 31p 4- 180p“ — 390p* 4- 360p‘ — 120p*) 4- 5n’'p“(5 — 36p 
4- SSp** - 78p» 4- 26p0 4- 15n»p®(l - 3p 4- 3p=* - p») 

= npq — Z0np\^(q — p)^ 4" 25n.*p*g* — 130n®p®g’ 4~ ISn^p’g®. 

Mr = Jip(l — 63p 4- 602p® — 2100p* -f- 3360p^ — 2520p® -h 720p*) 

4- n*pH56 - 686p 4- 2590p2 - 4270p» 4- 3234p^ - 924p9 4- nYilOS 

- 525p 4- 945p» - 735p» 4- 210p‘) 
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(? - v){'>m - 4 - 360«p^ 2’ + 56n2p2ry2 - + IQhn^p^q^). 

np{\ - 127p + 1932p- - 10206p3 + 25200p* ~ 31920p« + 20160p' 

- 5040p^) + ?iV(119 - 2394p + 13895p2 - 35700p!» + 46004p< 

- 29232p8 + 7308p«) + n^p^m - 3850p + 10990p2 - 14770p® 

+ 9520p^ - 2380p®) + nV(105 - 420p + 630p2 - 420p3 + I05p*) 
npg(l - 42pg(3 ~ 40pg(l - 3pg))) + 7«2pY(17 - 4pg(77 - 261pg)) 

+ 70n®p®g®(7 — 34pg) + 105n'‘p^g^. 



ON CERTAIN DISTRffiUTION FUNCTIONS WHEN THE LAW OF THE 
UNIVERSE IS POISSON’S FIRST LAW OF ERROR* 

By Feank M. Weida 

Introduction. The median, which is that value of a permuted variable 
which has as many observed values on one side of it as on the other, appears to 
be the natural competitor of the arithmetic mean when we are interested in the 
probable or most probable value of an unknown quantity. It is well known- 
that the law of probability, namely, Poisson’s first law of error, which results 
from the assumption that the median is the most probable I’alue of the unknown 
quantity is given by 

h 1*1 

/(x)=-C (1) 

<r 

Little is known about the form of the distribution functions of the more 
important statistics when the law of the “Universe” is Poisson’s first law of 
error. It, therefore, appears to be of interest and importance to enlarge our 
present knowledge of distribution functions by finding certain new ones when 
the variable or variables are defined by (1). 

In this paper we present the following results: (1) We have obtained an 
explicit expression for the distribution of means of samples of n; (2) we have 
obtained an expheit expression for the distribution of differences; (3) we have 
obtained an explicit expression for the distribution of quotients; (4) we have 
obtained an explicit expression for the distribution of standard deviations for 
samples of n; (5) we have obtained an explicit expression for the distribution of 
geometric means for samples of n; (6) we have obtained an explicit expression 
for the distribution of harmonic means for samples of n. 

In our analysis, we have made use of the theory of characteristic functions in 
the sense of Levy.^ This theory has been extended to more than one dimension 
by V. Romanovsky* and by E. K. Haviland.^ S. Kullback,® in his thesis, has 
made further extensions and has applied them successfully to the distribution 
problem in statistics. 

‘ Presented to the American Mathematical Society, February 23, 1935 

“ Brunt, David: "The Combination of Observations, ” 1923, p 27 

’Levy, P.: "Calcul des Probabilitfis;" pp 153-191. 

’ Bomanovsky, V. • “Sur un thfiorfeme limite du calcul des probabilites,” Recueil math6- 
matique de la Soci6t6 math4matique de Moscow, Vol. 36, 1926, pp 36-64. 

‘Haviland, E, K.: "On the inversion formula for Fourier-Stieltjes transforms in more 
than one dimension,” American Journal of Mathematics, Vol. 57, 1935, pp. 94-101. 

’ Eullbaok, S.: “An application of characteristic functions to the distribution problem 
of statistics,” Annals of Mathematical Statistics, Vol. V, No. 4, pp. 263-307 
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The explicit expression for the distribution of arithmetic means of samples of 
n is not new. This law of distribution has previously been obtained otherwise 
by F. Hausdorff and by A. T. Craig.® It is inserted here to show the superiority 
and greater power of our method when compared with previous methods and 
for the completeness of our discussion. The other results offered in this paper, 
as far as the writer knows, are new. 


1. The distribution of arithmetic means. Let us consider 

/(*)== i-a<x<a). (2) 

a 


If we assume that Xi, X 2 , - • • , Xn are independently distributed and that each 
a;,(i = 1, 2, • - • , n) is distributed according to the same distribution law, namely, 
Poisson’s first law of error, then it is fairly easy to see that the characteristic 
function for the law of distribution of means of samples of n is given by 



k tt\x\ 

- e 
<r 


I 3=1 ^ n 

^ dxj , 


(3) 


If u = (i = 1, 2, • • • , n), then it follows that the distribution function 
of u, namely, P(u), is given by 

s I' *}" * ' «) 

which, upon simplification becomes 


Pi-u) 


'2”'^k’‘ dt 

■Ku" (1 - oiiY ' 


(5) 


It IS readily seen that the poles of the integrand are of the order and are 
those of (1 — crtt)”. It follows by the well known Residue Theorem of Cauchy® 
that 


F(u) _ 1 


7r<r 


(n-l)! z” + 


( 6 ) 


If now, we replace u by n | x | , we will obtain the desired law of the distribu- 
tion of arithmetic means of samples of n which is 




e 


nlTilxj 


- 1)1 Un- 

defined for all values of x on the range a < x < a) 


(7) 


’Hausdorl!, F.: Beitrage zur Wahrscheinliclikeitsrechnung Koniglich Sdchsischen 
Gesellschaft der Wissenshaften zn Leipzig. Berichted uber die Verhandlungen. Math.- 
Phys. Olasee, Vol. 53, 1901, pp. 152-178 

* Craig, A. T.: “On the distribution of certain statistics,” American Journal of Mathe- 
matics, Vol. 54, 1932, pp. 363-366. 

* Macrobert, T. M.: “Functions of a Complex Variable,” 1933, pp. 67, 295. 
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A. T. Craig® has given the distribution laws of arithmetic means of samples of 
size 2, 3, and 4. These results as well as the results for any n are readily ob- 
tained from (7). 


2. The distribution of differences. Let us assume that the laws of distribu- 
tion of X and y are independent and that they are given respectively by 

1 _ w j, _ bd 

f{x)=—e " 1 ; /(?/)=— e ' 2 ; {- a < x < a) , i~ a < y < a) . 

<Tl 0’2 

In this case, the characteristic function of the law of distribution of differ- 
ences {x — y) is given by 


Ivl 


dy. 


♦6). rc'*'- 

Performing the operations indicated in (8) and simplifying, wc find that 

. 1 _ 1 

CTlCTj (1 — cxiit) (1 -|- ' 

It is fairly easy to see that the distribution law of u is given by 

p( ) — e~'*“ dt 

2 iriTiC 2 j-a (1 — criit){l + (Ttit) ’ 

Now, let {(l/<ri) — it] — v/u, then (10) becomes 


P(u) = 2fcifc;e r rj’*'’" 

irio'i(r2((7i 4- 0 - 2 ) / « 

' J tn 

* 

The integral in (11) is convergent because 

e~' dv 


e-” dv 


i-v)^ 

'1 + / " . 1 



[ \ viffa /J 


(- v) 


1 + 


\ o'icr2 / 


0 . 


Hence, we find that 


p, . 2hk2e '1 r"+> 

P{h) T T f 

nai(ri{(Ti + (Tt) Ja 


e~'’ dv 


i-v) 


1 + 


/ ffi -f 0-2 \ f 


( 8 ) 


( 9 ) 


(10) 


( 11 ) 


(12) 
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which upon simplification becomes 


P(u) ^ ^JZL±I^u 


0’l0'2(ffi <r%) ®»2 i <^10'2 


(13) 


where W. i 


(71 + era 


2 t <ri<r2 


w> is the confluent hypergeometric function.^' 


It is well known that 




dt 


for all values of k and m and for all values of z except negative real values. 
Clearly, 

ffi+U, 


*2 

which, upon simplification becomes 

™ , /ffl + 0-2 

0 , 

Hence, we now find that 


"•I I <7^(7, / r(i) jo 


2 I 


'l+'2 . 


u> = e 


P(w) = ^27^,' 

(TiO-iCeri + o-j) 


(14) 


(15) 


If now, we replace m by | a; | — | y | , we will obtain the desired law of distri- 
bution of differences which is 


P(\x\-\y\) = 


ikikt 


(7i(7ii<7i -f Oj) 


2 1^2 ' * 


(16) 


3. The distribution of ratios. We assume that the laws of distribution of 
X and y are independent and that they are given respectively by 

t - — i- - — 

/(x) = — e '"1 ; f{y) = —e '2 j {— a < x < cf), {— a < y < a) . 

<7i CTj 

Whittaker, E. T. and Watson, G. N,: “A course in modern Analysis,” 1916, pp, 333- 
334. 
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Let M = log I a: 1 - log 1 2 / 1 . The characteristic function of the law of distri- 
bution of quotients is then given by 


1, /"o -1^ i- /■« -~ 

Cl J~a J—a 


e 'la:*' dx 


fc V 

I, 


= 4^ [“ 

xiffi Jo 

Now, let s = x/xi and •m = v/ot, then clearly 

(pit) = j e"'*s*‘ ds j e~^'w~^‘d'W, 

whence 

= 4:kJc,(ri‘cr-;'*V{it + l)r(l - it). 


®2r“ dy . 


(171 


(18) 


It follows that the distribution law of u is given by 

p{u) = [“ + 1) r(l - it) dt 

2ir J-a 

which upon simplification, becomes 

P(u) = e-‘<’‘-‘“*''i+'“'''' 2 ^‘r(« -f 1) r(l - it) dt. (19) 

TT 


Now, let (1 — it) ~ —V, then (19) becomes 

P(u) = [ e-''l“-‘u 8 ''i+io«» 2 l- l»-ioK''i+>“«'>' 2 l r (2 •+• r) r(— r) dti. ( 20 ) 

2« 


Since it can be shown thatii 

(l/2rt) / ^ e-''“r(2 4 - v) r(- v) dv = r( 2 ) {1 + (l/ 6 “) }-* , 

we find that (20) becomes 

Pin) = ^^1^1 r(2) (l + . (21) 

0-2 ( (T2e'‘) 

Now, put e“ = I a: |/| 2 / 1 = R, whence from (21) we will obtain the desired law 
of distribution of quotients which is 


P{R) — 4fei^2<ril'(2) 

(tiR 



(22) 


*1 Macrobert, T. M., “Functions of a Complex Variable,” 1933, pp 114, 139, 151 
Whittaker, E T and Watson, G. N., “A course in modern Analysis,” 1916, pp 283. 
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4. The distribution of variances and standard deviations. If we assume 
that the variance and standard deviation are calculated about a sample mean 

n — X 

and if we let u — S.- if the a;, are independently distributed and each 

t“l 

a;, is distributed according to the same distribution law, namely, Poisson's first 
law of error, then it is clear that the characteristic function for the law of 
distribution of variances of samples of n is 


./A A /*“ .W /2fc /■“ 


e "ax 


(23) 


Let I represent the integral in the right-hand member of (23). We obtain 

_ 1^ 

that (dl/da) ~ I/o^, whence I = Ce Making use of the conditions:. 


<r — »a , 


i: 


dx 


h"' Vir 
Vt’ 


(T a, Ce " — > C, whence we find that 


/; 


e dx 6 

Vi 


— — p. " . 


Clearly, it follows that 

«(t) = 


( n — 1 )t % n—l 


e ^ . 


(24) 




We now find that the distribution law of u is given by 

( n — 1 )tr t n — 1 ^ n — 1 

IT ^ e 


Piu) = 


2n-l^„-lg 4 _ 2 


2'ir(r" 2 




(25) 


Evaluating the integral in (25) with a suitably chosen contour, we find that 

(26) 


, 2'‘-"^;"-v“2'e ”'*271 

P(m) = ; M 2 e 


2,,- r(:L^) 


Now, let u = whence from (26) we will obtain the desired law 


of distribution of variances which is 


n — 1 n — 1 


P(s*) = 


2 e ° 


n 


n — 3 

2 f.C*') * 


(27) 


Macrobert, T. M., “Functions of a Complex Variable," 1933, p, 67. 
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The law of distributions of standard deviations can be obtained at once from 
(27) since d(s“) = 2s ds. 

We shall now give the specific laws of distribution of variances for samples of 
size 1, 2, 3, 4, and 5 when the law of the “Universe” is Poisson’s first law of error. 
From (27), 

For n — 1, 

P{s^) = 0, (0 < s^ < oo). ^28) 

For n = 2, 




i -i-ji! 

2^ke "e 


(TS 


(0 < s* < 00 ) . ^29) 


For n = 3, 


P(s2) = 


4fcVe 



(7* 


For n = 4, 

% 

„/ jv 32fcVe" 'se"*® 

«“•> . 


For w = 5, 

80 fe‘ir^e “'s’’e~'* 
F{s^) = 


(0 < s* < oo). (30) 


(0 < s=* < oo). (31) 


(0<s2<oo). (32) 


5. The distribution of geometric means. As before, we assume that the 
i:, are independently distributed and each a;» is distributed according to the same 
distribution law, namely, Poisson’s first law of error. Then, clearly, the charac- 
teristic function for the law of distribution of geometric means of samples of n is 





Now, put s = x/o, then (33) becomes 





tr" s’* ds 


2’*fc”(r"“{r(if -4-1)}”. 


(33) 


(34) 


It follows at once that the distribution law of u is 


( 35 ) 
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Now, letii + 1 = -v, then (35) becomes 


P(u) = -----T- e 


-1 l-in 


„u+nl0BT / g»(i<+«l«*«» I p( _ I „ 

2iri y-i-ia 


It is well known that (10) 


{r(-y)}” = ~ 


(_1)»T« 


sin" ir«{r(D + 

Using (37) in (36), we readily find that 

-2"^" ~ D" 

2n 




(36) 


(37) 


(38) 


It IS fairly easy to see that the poles of the integrand in (;iH) are the poles of 
{r(-r)j" and that these poles are of the n"' ortier. Applying the well known 
Residue Theorem of Cauchy (8), we find that 


« 

P{u) = ^ 

a » 0 


^ ][^n-f-rta+l 

{n - 1)! 


fd^ 

fjn Uj-itir) “jN 


jf(r+ i)}"J/ 


(39) 


Now, since « = log | Ti | + log | a;^ | + . . . + log ! !, then clearly, the dis- 

tribution law of the geometric mean, G, is obtained from the law of diHtHKntion 
for w by means of the transformation 


Hence, from (39), we find the desired law of distribution of geonn.trie means 
of samples of n which is 


a4i W'*'L!r(t*+ i)l' 


rmu 


(40) 


6 . The disttbution of harmonic means. Let us assume that f{x) is the 
*' " “ O' 

F(x') = (l/a;'“)/(l/x') 

fi l/x IS continuous on the range of definition of f(>»\ \r • • 

Poisson's first law of error, we find that ^ ^ /(«) 


''M - FCl/.) . ‘ ; 

cr ^ 


(-~a^x<0), (0<T^a). 


(41) 

ican Mathit;! SiftrvT 31^21^ T' of the Amer- 

ablea with given frequency laws ” Annala ni Ar'tk ^piuenoy law of » function of vari' 
p, 18 . Second Series, ¥01.27, 1926-26, 
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We assume that the x[ are independently distributed and each x'i is dis- 
distributed according to the same law of distribution, whence we find that the 
characteristic function for the law of distribution of harmonic means of samples 
of n is 


♦ft), 


from which, after simplification, we find that 

fc"2” 




(1 — aity” ’ 

We now find that the law of distribution for u is 


= -sr L (1^ 


it) 

which, after evaluation and simplification, becomes 




Piu) 


r’>r(3ji) 




(42) 


(43) 


(44) 


Recalling that in this case, u = 1/| aii 1 + 1/| Xa | + • • ■ + 1/1 Xn |, we make 
the transformation u = n/H, where H is the harmonic mean; whence, from 
(44), we find that the desired law of distribution of harmonic means of samples 
of n is given by 


P(ff) 


(T^rCSn.) 


n 


(45) 


7. Conclusions. We have shown that the same analysis is applicable to find 
the explicit expression for all the distribution laws we have discussed in this 
paper. 

The Geobob Washington University, 

Washington, D. C. 


ERRATA 

In my paper* there appear tw o blunders which wore called to my attention by A. T. Craig. 

In section '4, pages 107-108, headed “The distribution of variances and standard devia- 
tions,’’ I have obtained the distribution function of the sum of the squares of a — 1 inde- 
pendent values of x and not the distribution function of the sum of the squares of the 
deviations from the sample mean of the n independent values of x. 

In section 2, pages 10)W05, headed “The distribution of diffeiences,” I have obtained, 
the distribution function of the diffctonccs of absolute values and not the distribution , 
function of the actual differences. , ’ 


* Wdda, F 11 , “On Certain Distribution Functions when the Low of the Universe is Poisson's Fust Daw of 
Error,” Annals of hfatfaeaiatical Statistics, Tol. VI, No. 2, June, 1935, pp, 102-110. 
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ON THE PROBLEM OF CONFIDENCE INTERVALS 

By J. Netman 

When discussing my paper read before the Royal Statistical Society on 19th 
June, 1934, Professor Fisher said that the extension of his work concerning the 
fiducial argument to the case of discontinuous distributions, as presented in 
my paper, has been reached at a great expense: that instead of exact probability 
statements we get only statements in the form of inequalities. 

This remark raises the question whether the disadvantage of the solution 
which he mentioned (the inequalities instead of equalities) results from the un- 
satisfactory method of approach, or whether it is connected with the nature of 
the problem itself. 

I think that the problem is of considerable general interest. For instance it 
may be asked whether the confidence intervals for the binomial distribution 
recently published by E. S. Pearson and C. J. Clopper,^ which correspond to 
the probability statements in inequalities, could be bettered. 

The purpose of the present note is to show, (1) that in some exceptional cases 
the exact probability solution of the problem exists and that then it may easily 
be found by the method described in Note I of my paper (2) that in the general 
case of discontinuous distribution exact probability statements in the problem 
of confidence intervals are impossible. 

In particular it will be seen that exact probability statements are impossible 
in the case of the binomial distribution and so that the system of confidence 
intervals published by Clopper and Pearson could not be bettered. 

In order to avoid any possible misunderstanding I shall start by restating 
the problem. 

We shall consider a random discontinuous variate x, capable of having one 
or another of a finite, or at most denumerable set of values 

' ' ' ^nj ( 1 ) 

We shall assume that the frequency function, say p (s | 0), of a: depends upon one 
parameter B, the value of which is unknown. The problem of confidence in- 
tervals consists in ascribing to every possible value of x e.g. to x^, (n = 1, 2, • • • ) 
a "confidence interval,” say 8i{n) to Bi{n) such that the probability, P, of our 
being correct in stating 

Biin) ^6 ^ Bi{n) (2) 

whenever we observe x = Xn (n = 1, 2, • • •); is either: 

‘E. S. Pearson and C. J. Clopper: The Use of Confidence or Fiducial Limits in the 
Case of the Binomial. Biometrika Vol. XXVI, pp. 404-413. 

»J. R. S. 8. Vol. 97, p. 689. 


Ill 
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(a) equal to a given value a < 1 chosen in advance, or 

(b) at least equal to this value a. 

I proposed to call this chosen value a the confidence coefficient. 

In the earlier paper I showed that the solution of the problem in its form (b) 
is always possible and easy to find. If the variate x is continuous, then the 
solution of the problem (a) is equally easy. At present we shall consider whether 
and under what conditions the solution (a) is possible when the variate x is 
discontinuous. 

Suppose that the variate x is discontinuous as described above, and that the 
solution of the problem in its form (a) exists and is given by the system of 
confidence intervals (di(x„), d 2 (x„)) for w = 1, 2, • . • . 

The position is illustrated in the diagram below. On the axis of abscissae 
the possible values of the variate x are marked. The axis of ordinates is the 
axis of 0. The confidence intervals are marked on verticals passing through 
corresponding values of x. 


DIAGRAM REPRESENTING THE CONFIDENCE INTERVALS. 

,6. (n) 

,0.H) 


a MARKS A POINT BELONGING 
TO THE BET OP ACCEPTANCE X(o). 


e.(il 

«t(*) 

0.(3) 

( ) ■ -- ( 




0,(2) 

0,(3) 

0,(1) 



0,(4) 


,0.(5) 


01.(5L 


0 . (n) 


X| Xg 


X3 X3 


According to our hypothesis the interv’^als (0i(a:„), 02(Xn)) are so chosen that 

P = a (3) 

P is the probability of an event, say £!, which we shall describe in some detail. 
Let us denote generally the probability of any event abyP{aj. P{al6} will 
denote the probability of an event, o, calculated under the assumption that 
another event, b, has already occurred 
Now 

P = PiP} = the probability that {either (a; = zi) and then 0i(l) g 0 ^ 0s(l) 
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or (x = Xi) and then 0i(2) ^ 6 ^ 62(2) 


or {x = x„) “ “ 03 (n) ^ 0 g 02(n) 


= P{x = a;3lP{0i(l) ^ e g 02(1) I (x = rci)} 


+ P{x = a;2lP{0i(2) g 0 ^ 02(2) I (x = 0:2)} 

+ 

= ^ P{® = a:„}P{0i(?i) ^ 0 ^ Bi(n) | (a: = a;„) } = a (4) 

n"=l 

The calculation of the probability P in the above form is not convenient, as 
both multipliers in each term of the sum m (4) depend upon the unknown 
probability function a priori of 0. Therefore we shall present P in another 
form, giving to the event E a geometrical interpretation. Let us denote by 
CB the set of all confidence intervals (0i(n.), 62(11)), as marked on the plane of 
X and 0. Thus CB will be composed of points with co-ordinates x and 0, where 

X = Xn 

0i(n) g 0 ^ 02(n) 

The set CB will be called the confidence belt. 

Denote by A any point of the plane of x and 0, having any values for its 
co-ordinates. 

It is easily seen that the event, which we denote by E, and the probability 
of which is P = a, consists in the point A belonging to the confidence belt CB. 
In fact the event E occurs if and only if the co-ordinates of A fulfil the condi- 
tions (5). But just these conditions define the points belonging to CB. 

The above circumstance allows us to calculate P by means of a formula which 
discloses its connection with p(x \ 0). 

Fix any possible value of 0 = 0' and draw the straight line LL the points 
of which have just this fixed value 0' for their ordinates. The line LL will cut 
some of the confidence intervals. Denote by X(d') the set of points of inter- 
section, and by <t>(6) the un kn own frequency function of 0. The set X(6) will 
be called the set of acceptance corr^ponding to the specified value of 0. 

The function <^(0) may be continuous or not. So may be p(x | 0) considered 

as a function of 0. These cases may be treated together if we agree that ^ F(6) 

e 

will denote either the sum or the integral of P(0) extending over all values of 0, 
whenever F{ 9 ) is integrable. 
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Using this notation we may write 

P = P{E} = r (^(«) Z (P(^l«))| (6) 

9 I -*(9) J 

where Z denotes the summation over all values of x belonging to X{e). 

X(9) 

From the formula (6) may be deduced the following important proposition. 
The probability P may possess a constant value ct, independent of the properties 
of the unknown function <f){6), if and only if for each d 

Z i ^)) = « (7) 

Jr{9) 

The condition (7) is obviously sufficient to have P = a. In fact, if it is satisfied, 
then we should get from (6) 

P = a = “ ( 8 ) 

9 

since Z = 1 whatever the frequency distribution of 9. It is equally 

9 

easy to see that the condition (7) is necessary for having P = a whatever the 
function <l>{9). For suppose that for 5 = 01 we have 

Z I ®i)) = /3 7^ a (9) 

X(9i) 

Then if it happens, that 

<^(0i) = 1 for 0 = 01 (10) 

and 

<f>(,9) = 0 for 0 5^ 01 (11) 

the only term in the sum Z which is different from zero will be that corre- 

9 

spending to 0 = 0i and the formula (6) will reduce to 

JP = Z (P(® 1 0i)) = /3 5^ a . (12) 

x{ei) 

The original question, whether the solution of the form (a) is possible when 
the variate x is discontinuous is thus put in the following form: is it possible 
to define for every possible value of 0 a set of acceptance X (0) such that the 
equation (7) holds good? 

The answer is: in some cases it may be possible, but this depends upon the 
nature of the function p{x \ 0). It is very easy to invent functions p(x 1 0) for 
which the equation (7) for a definite value of a holds good, and we may even 
fix in advance the sets of acceptance X{d). However the important question 
is not whether there may exist elaborately invented cases of discontinuous 
distributions where the solution (a) exists, but rather whether this solution 
exists always, or at least whether it exists frequently and in cases which are 
practically important. 
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This question must be answered in the negative on the basis of the following 
example concerning the most important of the discontinuous distributions, the 
Binomial. 

In fact it will he seen below that if a: is a variate following the binomial fre- 
quency law, then whatever the arrangement of the sets of acceptance X{e), 
corresponding to different values of d, the left hand side of the equation ( 7 ) 
cannot be constantly equal to the confidence coefficient a < 1, It will follow 
that in the case of the binomial distribution, the solution of the problem ( a ) 
is impossible. 

To prove this we shall consider the variate, x, following the binomial frequency 
law. That is to say we shall assume that x may have values 0, 1, 2, • • ■ n, 
and that 

TL ^ 

while 0 < 0 < 1. Since the set of possible values which x may have is finite, there- 
fore the set of all confidence intervals must be finite also. It follows that there 
is possible only a finite number of sets of acceptance Xid). Therefore there 
must be at least one set of acceptance, say Z®, which will be common to an 
infinite number of values of d, say 82 , ••• dn, •• • so that for each it will 
be Z(5„) = Z». 

Now 

Z (p(a;|0n)) (14) 

for all these values of 6 = dn will be the same polynomial in 0 of the order n. 
If it has the same value a for a number of values of 6 exceeding n, it means that 
this polynomial is an absolute constant. Therefore if it were possible to give 
a solution of the type (a) in the case of the binomial distribution, it would be 
possible to construct a sum (14), the terms of which are all different and have 
the form (13), and such that after all possible reductions and simplifications 
all terms involving 8 would cancel and we should be left only with one constant 
term a<l. This, however, is impossible, since the only term of the form (13) 
which involves a constant, is the term corresponding to x = 0 

p(0 I 0) = (1 - 0)" = 1 - n0 -1- T l J) 02 (15) 

and then this constant is 1. Other terms of the form (13) involve 0“ as a multi- 
plier. Therefore there exists only one sum of the form (14) which is an absolute 
constant, but this includes all the terms (13) 

^ (p(x 1 0)) = 1 (16) 

and thus is of no value. It follows that whatever the sets of acceptance X ( 8 ) 
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the corresponding sum (14) will have values varying with the value of 6 and 
hence the solution of the type (a) in the case of the binomial does not exist. 

This, I think, gives the solution of the question raised by Professor Fisher. 
It is clear also that whenever the solution of the type (a) exists, it may be 
found by a suitable choice of sets of acceptance, and thus by the method ex- 
plained in my earlier paper, 

I should like now to raise another question. Past experience shows that the 
general problem of estimation may be formulated in different ways. The form 
of this problem as it appears in Bayes theorem, required for its solution the 
knowledge of the probabilities a priori. 

The form of the same problem treated by R. A. Fjsher in his theory of esti- 
mation was solved in terms of a new conception, that of likelihood. 

The problem of estimation in its form of confidence intervals stands entirely 
within the bounds of the theory of probability, without involving any concep- 
tion not already inherent in this theory. In the case of continuous distribution 
the problem also allows the solution (a) entirely independent of the probabilities 
a priori. Now it is .shown that the necessity of the solution (b) is bound up 
with the nature of the problem if the distributions are discontmuous. 

My question is: is it possible to formulate the problem of estimation in a 
fourth form, leading to a solution which (1) stands entirely on the grounds of 
the classical theory of probability, and (2) is not depending upon the probabili- 
ties a priori — ^whatever the conditions of the problem? 



ANALYSIS OF VARIANCE CONSIDERED AS AN APPLICATION OF 
SIMPLE ERROR THEORY 

By Walter A. Hendricks 


The need for an elementary presentation of the methods of analysis of vari- 
ance has been recognized by many investigators in various fields of research. 
A recent monograph by Snedecor (1934) is undoubtedly the most comprehensive 
attempt to satisfy this need which has appeared in the literature relating to 
the subject. Snedecor's treatment of the subject consists largely of the presen- 
tation of a number of standard types of problems to which the methods of 
analysis of variance are applicable, directions for performing the necessary com- 
putations, and a discussion of the conclusions which may be drawn from the 
data on the basis of the analysis. 

In the opinion of the author of this paper, an elementary presentation of some 
of the theoretical considerations upon which the methods of analysis of variance 
are based would also be of some value. The methods of analysis of variance, 
as given by Fisher (1932), are presented as a natural consequence of intraclass 
correlation theory. However, the essential concepts may be presented in a 
more comprehensible form by the use of simple error theory. 

It seems appropriate to begin such a presentation with a definition of variance. 
If we have an infinite number of measurements of the same quantity, the 
variance of a single measurement is defined as the arithmetic mean of the 
squares of the errors of those measurements. In actual practice, an infinite 
number of measurements can never be obtained. We have instead a sample 
of n measurements, Xi, X 3 , ■ ■ ■ x„, from which the variance of a single measure- 
ment may be estimated. By referring to any text on the method of least 
squares, it may be verified that the best estimate, S^, of the variance of a single 
measurement which can be obtained from a sample of n measurements is given 
by the equation: 


n — 1 


mY 


( 1 ) 


in which m represents the arithmetic mean of the n measurements. The 
quantity, n — 1, in the terminology of analysis of variance, is designated as 
the number of degrees of freedom available for estimating S^. 

It is often necessary to estimate from a number of different samples of 
measurements. In such cases, the best estimate of is obtained by calculating 
the weighted mean of the variances estimated from the individual samples, each 
variance being weighted by the number of degrees of freedom which were avail- 
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able for its estimation. The number of degrees of freedom upon which such an 
estimate of S® is based is given by the sum of these weights. Such an estimate 
of the variance of a single measurement is often designated as the variance 
"within samples." 

In one of the simpler applications of analysis of variance, a number of samples 
of measurements are available, and the investigator is required to determine 
whether the magnitude of the quantity measured varied from sample to sample 
or whether all of the measurements may be regarded as having been made upon 
a quantity of the same magnitude. 

An estimate, of the variance within samples may be obtained. Since 
is an estimate of the variance of a smgle measurement, the variance, of the 
arithmetic mean, m„ of the measurements in any one sample is given by the 
equation: 


in which n, represents the number of measurements in the sample. Let there 
be r samples. Then another estimate, S[^, of the variance of the mean, w<, 
may be obtained from the observed distnbution of the means, mi, Wj, • . . nir, 
by the use of the formula for calculating the variance of a weighted observation 
as given in texts on the method of least squares : 


S\ 


'a 


1 

n.(r - 1) 


~ m)“ + ni{mt — w)* + • • • nr{mr — w)*] ... .(3) 


in which: 


niMi 4- ri2ms 4- • • • + /.v 

m = . ^4; 

Ml + na -f ' • • 4- n. 

Equations (2) and (3) yield two estimates of the variance of the mean, to,. 
It is apparent that these two estimates will be equal, within the limits of sam- 
pling ductuations, if all of the measurements in the r samples were made upon 
a quantity of the same magnitude. If the magnitude of the quantity measured 
varied from sample to sample, S'i’^ will be greater than However, in actual 
practice, the two estimates of the variance of a particular mean are not com- 
pared directly. An equivalent .comparison is made between two estimates of 
the variance of a single measurement. The first of these is nothing more than 
the variance within samples discussed earlier in this paper. The second esti- 
mate, which may be designated by S'^, is the value which would have to be 
substituted for in equation (2) in order to make Si equal to the value given 
for S'i^ by equation (3). It is quite apparent that S'^ may he found by the 
use of the equation: 

[mi(wi — my -f- n2(TOs — to)* -!-•••+ nrimr — to)*] (5) 
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is often designated as the vari&nce “between samples.” A comparison of 
5'“ with is obdously equivalent to a comparison of S[^ with 
If S'* is greater than S^, a statistic, z, may be calculated; 



( 6 ) 


This statistic serves as a useful comparison between S'* and since its sampling 
distribution is known if all of the measurements comprising the data under 
investigation were made upon a quantity of the same magnitude. The distri- 
bution of z, under these conditions, is given by an equation of the form: 




in which ni represents the number of degrees of freedom available for estimating 
S'^, and rii represents the number of degrees of freedom available for estimating 
S^. It is apparent from equation (5) that r — 1 degrees of freedom are avail- 
able for the estimation of in the particular problem under discussion. 

When any estimate of the variance of a single measurement is multiplied by 
the number of degrees of freedom available for making that estimate, the re- 
sulting product is known as a “sum of squares.” The additive property of 
the sums of squares and the degrees of freedom contributes much to the elegance 
of the scheme of analysis just presented and is of considerable practical impor- 
tance in problems of a type to be discussed later in this paper. In the case 
of the problem discussed above, the additive property of the sums of squares 
provides that the sum of the “sum of squares between samples” and the "sum 
of squares within samples” is equal to the sum of the squares of the deviations 
of all of the measurements from their arithmetic mean. The additive property 
of the degrees of freedom provides that the sum of the “degrees of freedom 
between samples” and the “degrees of freedom within samples” is equal to the 
"total degrees of freedom” which is nothing more than the total number of 
measurements diminished by unity. 

The methods of analysis presented above may be applied to any study of the 
effects of a number of experimental treatments of the same kind upon the 
magnitude of a measurable quantity. If experimental treatments of more 
than one kind are imposed simultaneously, the effects of each may be studied 
by modifications of those methods. The discussion of those modifications, 
about to be presented in this paper, is limited to data which may be classified 
in an “r X fi” table, i.e., to studies of the effects of only two kinds of experi- 
mental treatments. More complex problems may be treated by simple ex- 
tensions of the methods presented. 

Consider an “r X s” table composed of rs cells, each of which contains a 
number of measurements of some quantity. The magnitude of the quantity 
measured may vary from cell to ceU, but the essential conditions under which 
the measurements were made must be the same for all cells. It is also under- 
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stood that no cell may be empty. T^le 1 is an example of suen a table. The 
individual measurements have not been represented. Only the number of 
measurements, n;,-, in each cell and the arithmetic mean, m„, of those meas- 
urements have been indicated. The arguments, Oi, represent r experimental 
treatments of one kind, and the arguments, h,, represent s experimental treat- 
ments of another kind. The problem to be solved is to ascertain whether or 
not the differences among the experimental treatments of each kind had any 
effect on the magnitude of the quantity measured. 

TABLE 1 

Example of an “r X s’’ Table Showing Only the hfumber cjT Measurements in 
Each Cell and the Arithmetic Mean of Those Measurements 


bi bi bs 64 ba 



If each cell contains the same number of measurements, the effects of the 
experimental treatments indicated by the arguments, o„ may be studied by 
comparing the variance “between rows” with the variance “within cells.” The 
variance between roWs may be calculated by regarding the r rows as r samples 
of measurements and applying an equation of the same form as equation (5). 
The variance within cells may be obtained by calculating the variance of a 
single measurement from the data in each cell separately and taking the mean 
of the resulting values. The effects of the experimental treatments indicated 
by the arguments, b,, may be studied by comparing the variance “between 
columns” with the variance “within cells.” 

If the degrees of freedom between rows, between column^, and within cells 
are kdded, the sum will be less than the total number of degrees of freedom 
in the table. If the corresponding sums of squares are added, the sum is likely 
to be less than the total sum of scjuares. The differences are due to what is 
customarily designated as “interaction between rows and columns.” The 
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more descriptive term, “differential response,” is sometimes used to designate 
the same factor. The nature of this factor may be investigated by considering 
the effects of the expenmental treatments, b,, in each row of Table 1. 

The data in each cell of Table 1 may be regarded as a sample of measure- 
ments. Therefore, the data in any row may be regarded as a set of s samples 
of measurements. By applying an equation of the same form as equation (6) 
to the data in any row, an estimate of the variance of a single measurement is 
obtained from the observed distribution of the means of the cells in that row. 
By calculating the arithmetic mean of the estimates for the r rows, an estimate 
of the variance of a single measurement is obtained from r{s — 1) degrees of 
freedom. This estimate may be designated as the variance “between cells in 
the same row.” 

The variance between cells in the same row measures the average effect of 
differences among the experimental treatments, ?>,■, in individual rows. The 
variance between columns, which was discussed earlier in this paper, is calcu- 
lated from s — 1 degrees of freedom and measures the effect of differences 
among the treatments, hj, on the assumption that the effect oi any one treat- 
ment upon the magnitude of the quantity measured was constant for every row. 
The number of degrees of freedom assignable to differential response of the 
various rows to the treatments, b,, is r(s — 1) — (s — 1) or (r — 1) (s — 1). 
The sum of squares due to differential response is given by the difference be- 
tween the sum of squares between cells in the same row and the sum of squares 
between columns. These relations follow from the additive property of degrees 
of freedom and sums of squares. 

It may be observed that precisely the same results would be obtained by 
considering the effects of the treatments, o,, in the various columns of Table 1. 
The degrees of freedom and sum of squares due to differential response of the 
various columns to the treatments, a„ would be exactly equal to the correspond- 
ing values obtained for the differential response of the various rows to the 
treatments, b,-. 

Up to this point the discussion has been concerned only with the special case 
in which each cell of Table 1 contains the same number of measurements. As 
a matter of fact, the methods given for the analysis of such data will yield 
correct results when applied to any “r X s” table in which the numbers of 
measurements in the cells in every row are proportional to the corresponding 
marginal totals for the columns, and the numbers of measurements in the cells 
in every column are proportional to the corresponding marginal totals for the 
rows. 

When the numbers of measurements in the various cells do not satisfy the 
above condition of proportionality, the distributions of the means of the rows 
and columns may be distorted, and, consequently, the methods of analysis 
described above may yield incorrect results. Efficient methods of analyzing 
such data have been presented by Yates (1933). A comprehensive discussion 
of these methods is considerably beyond the scope of this paper. One method, 
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described very briefly by Yates (1933) and designated as the “method of 
weighted squares of means,” appealed to the author as being particularly 
valuable for practical work. No detailed discussion of the method seems to 
be available in the literature. Therefore, the following presentation may be 
of some interest. 

Consider the experimental treatments represented by the arguments, a„ in 
Table 1. It is necessary to find an average value for the magnitude of the 
quantity measured for each row of Table 1. However, this average must be 
of such a type that its value will not be distorted by the unequal numbers of 
measurements in the various cells. The unweighted arithmetic mean of the 
means of the cells in the row seems to be the logical average to use since, within 
the limits of sampling fluctuations, the value of this average will be identical 
with the value which would have been obtained if each cell had contained the 
same number of measurements. The averages for the r rows are: 

nia = -(mil + mn • -f mi.) 
s 

ntai — — (mji "b m22 4- • . . mz,) 
s 


Mar ~ — (.Mtl "f- TTlrt “t" • • ' "b Mr») . 
S 


.( 8 ) 


By the law of propagation of error, the variance of any one of these unweighted 
means is given by the equation : 


sh = \iSl^ + Sl,+ + SI) (9) 

52 

in which Sl^ is the variance of Wai, and Sh, 8]^, • • • , S], are the variances of 
m,i, m, 2 , • . ■ , mi,, respectively. If represents the variance of a single meas- 
urement, equation (9) may be written in the form: 


SI 


(— 


_| Y- 


+ 


rk,/ 


( 10 ) 


The value of 5“ may be estimated from the individual measurements In the 
various cell«. is nothing more than the variance within cells, as customarily 
calculated, and may be estimated from the N — rs degrees of freedom within 
cells, in which represents the total number of measurements in Table 1. 

The variance of a single measurement may also be estimated from the observed 
distribution of the means of the type, m„,. These means are not of equal weight. 
Therefore, in order to find the variance of any one of them, it is first necessary 
to calculate the weighted mean of the r individual means. Since the weight of 
an arithmetic mean is inversely proportional to its variance, it is evident from 
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an inspection of equation (10) that the weight, 'Pa^, of a mean, ma„ may be 
found from the equation; 


V<^i 



lh.2 


+ 


+ i: 


(11) 


The weighted mean, may then be found: 


Wla 


Pa,ma, +Pa^ma,+ ■ • ■ -f 

Pai + Pai + • • • + Par 


( 12 ) 


The variance /Si of any mean, TOa„ as estimated from the observed distribution 
of means of this type, is given by: 


g' 2 ^ [pr^Xnia, - may + PatiWa, — maY + • • • 

- 1 ) 

+ Pari'niar — TOa)^] (13) 

By substituting Sa\ for ;Sa„ and /Sa for in equation (10) and solving the 
resulting equation for an estimate, Si, of the variance of a single measure- 
ment is obtained from the observed distribution of means of the type, ma,. It 
is evident that, after making the indicated substitutions, equation (10) reduces 
to the form : 

g2 

= z [paSv^ai — maY PaXl^a, - Wa)® Par(wi„, - Wa)“] (14) 

r — 1 


It is interesting to observe that, if the numbers of measurements in the re- 
spective cells were equal, equation (14) would reduce to the formula for calcu- 
lating the variance “between rows” as customarily applied in analysis of 
variance. 

The two estimates, S'‘ and Si, of the variance of a single measurement may 
be compared in the usual manner by taking one-half of the natural logarithm 
of the ratio of the larger estimate to the smaller and making use of the tables 
of the values of “z” given by Fisher (1932). When using these tables, it is 
important to remember that was estimated from r — 1 degrees of freedom. 

The method of analysis just described may be employed to study the effects 
of differences among the experunental treatments indicated by the arguments, 
hj, on the magnitude of the quantity measured. The unweighted means for 
the s columns are : 


mt, = - (mil -f wa -t- 
T 

mb , = - (tois -|- wik + 
T 


+ ’Wrl) 


+ mri) 


mb, = - (mu + mu 

T 


+ mr,) 


( 16 ) 
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The weight, pb,, of a mean of the type, Wb,, may be found from the relation; 



^bj ^1/ ^2/ 'f^rj 


.(16) 


A weighted mean, nib, may be calculated; 


_ PfciWt. + + • • • + Ph^b, . 

*' Pb, + Piz + ■ • ■ + Ph 

An estimate, (Sj, of the variance of a single measurement may be obtained from 
the observed distribution of means of the type, mi,, by the use of the equation : 

^6 = 7 Ipbiimb, - mbY + PbX'f^h ” ^br + • • * + Phimb, — . . .( 18 ) 

S ““ J. 

si may be compared with S^ in the usual manner. 

If it is necessary to study the “interaction between rows and columns,” the 
effects of the experimental treatments, 6,, may be studied for each individual 
row of Table 1. Consider the distribution of the means of the ceUs in a row 
designated by the argument, a<. The weight of any one of these means is 
equal to the number of measurements in the cell. A weighted mean, iRo,, of 
the s means of cells in the row may be calculated : 


m 


t 


-f. . . . n„m„ 
nil + w,2 n„ 


(19) 


The variance, S[l, of the mean, m,/, for any cell in the given row, as estimated 
from the observed distribution of means of this type, may be obtained from the 
equation: 


s:i 


ni,{s — 1) 


[n.i(rR,i — m(.)2 + n,s(m,s — m',)* + • 


+ — m'y[ 


( 20 ) 


The variance, S* , , of the same mean, as estimated from the distribution of the 
individual measurements in the cell, may be obtained from the equation: 

iS?, (21) 

THj 

By substituting for Sl,, and (Sa.-b for S*, in equation (21) and solving the 
resulting equation for iS^jb) estimate, Sl^i, of the variance of a single meas- 
urement is obtained from the observed distribution of the means of the cells 
in the given row. After making the indicated substitutions, equation (21) 
reduces to the form : 


[iiii(iRii — "^“i) “b nn^nix^ — -f- . * * 

o ' i 

+ — ml,)*] 


( 22 ) 
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Such. £ui estimate, of the variance of a single measurenient may be 

obtained for each of the r rows in Table 1. By calculating the average, Sl^, 
of the variances of the type, Sl^t, an estimate, of the variance of a single 
measurement may be obtained from the r(s — 1) degrees of freedom between 
cells in the same row: 

r 

Sib = - . _ - mlf + ■ • ■ 

^ ■' I « 1 

+ nuim„ — m'/] (23) 

Equation (23) is identical with the formula for calculating the variance between 
cells in the same row as ordinarily applied in analysis of variance. This result 
is a direct consequence of the fact that the unequal numbers of measurements 
in the various cells had no distorting effect on the arithmetic means for indi- 
vidual cells. 

The presence or absence of interaction may be verified by comparing Si i 
with 5?. In general, the actual variance due to interaction can not be obtained 
by the “weighted squares of means” method, for the various sums of squares 
do not possess the additive property when the analysis is made in this way. 
However, the comparison suggested above will yield sufficient information for 
most practical purposes. 

For the special case in which r or s is equal to 2, the actual variance due to 
interaction may be calculated. Suppose r = 2 in Table 1. The following 
method, suggested by Yates (1933), yields an estimate of the variance due to 
interaction from a consideration of the differences, d,-, between the means of 
the two cells in each column: 

di — mu — mzi 
dj = mu — m22 


d, = mu — mi, (24) 

The variance, Sl^, of any difference, d„ is given by the equation: 



The weight, pj, of the difference, d,-, is given by the equation : 


i = — -b— . 

Pj ^3 


.(26) 


The variance of the difference, d„ as estimated from the observed distribution 
of differences, is given by the equation: 

1 


8'J = 


p,{s - 1) 


[pi(di — dY -f piidi — dy -{■ ■■■ + p,(d, — d)*] . . . (27) 
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in which: 

^ + P8^!! + • • • + vA 

~ Pi + P2 + • • • + 

By means of these relations, an estimate, 8l, of the variance of a single measure- 
ment may be obtained from the observed distribution of the differences of the 
type, dj . This estunate represents the variance due to interaction and may be 
obtained from the equation: 

Si = -- iplidt - dy -b Pi(.d2 — dy + •■• + pM, ~ dy] ( 29 ) 

It is quite apparent that s — 1 degrees of freedom are available for the esti- 
mation of the variance due to interaction in this particular example. 
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note on the distributions of the standard deviations 

AND SECOND MOMENTS OF SAMPLES FROM A 
GRAM-CHARLIER POPULATION 

By G. A. .Baker 


T. N. Thiele in his “Theory of Observations" makes the following statement 
with regard to the distributions of the higher half-invariants in samples of n: 
“Not even for (ii have I discovered the general law of errors.'" The purpose 
of this paper is to shed some light on the distribution of /xa and to give the distri- 
bution of second moments about a fixed point when the sampled population 
can be represented by a Gram-Charlier series. 

The distribution of the second moments about a fixed point of samples is 
given in complete generality. It is known that if the sampled population is 
normal there is a simple relation between the distribution of the standard 
deviations of samples of n and the distribution of the second moments of the 
samples about the mean of the population. It was thought that such a relation 
might exist in case the sampled population could be represented by a Gram- 
Charlier series. Such is not the case. Again, it was thought that by obtaining 
the distribution of the standard deviations for samples of 2, 3, 4, • • • it might 
be possible to deduce empirically a general law of distribution. This proved an 
unfruitful line of investigation but required so much labor that the results 
should be reported to save others time and energy. 

First, suppose that a population may be represented as 


( 1 ) 

where 


f{x') = Clo<Poix) -)- flaws W ■)■ "t" ■ ‘ • 


Wi(^) = 


dx' 


Then applying Theorem II of the author’s paper on “Random Sampling from 
Non-Homogeneous Populations’’^* we deduce at once the following theorem. 

Theorem I. The distribution of the second moments about the origin of 
(1) of samples of n drawn at random from a population represented by (1) is 
precisely the same as the distribution of the second moments about the same 
point of samples of n drawn from a population represented by the first term of 

(1), that is a normal population, and is proportional to a; ^ e “ * (loc. cit.) 


'Thiele, T N., “The Theory of Observations,” reprinted in the A nnols of Mathematical 
Statistics, Vol. 2, No., 2, May, 1931, p. 208. 

‘ Metron, Vol. 8, No, 3, Feb. 28, 1930, 
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This is not so surprising as it may seem at first if it is remembered that the 
odd subscript terms of a Gram-Charlier series slice off frequencies on one side 
of the mean of avpa{x) and add them onto the other side in the same manner. 
If we suppose that a population is given as 

(2) + ai<pt{x) + ai(pi{x) + ■ • • 


in the same manner we get the following theorem. 

Theorem II. The distribution of the second moments measured from the 
origin of (2) of samples of n drawn at random from (2) will be a combination 
of distributions of the type of Theorem I with only even subscript terras con- 
tributing anything. The variations in the component distributions will consist 
of differences in the constant factors and the exponent of x, the estimate of the 

second moment. The lowest exponent will be ^ ^ . 

Jt 

For instance, if 

(3) fix) = o^aix) + ami^) + a^cpiix) 


and n = 2, the estimates of the second moment will be distributed as pro- 
portional to 


g-ix 


(do -f- 3)^ — "i.^OnictiQ “[- 3)21 -j- (3fid4 “b 6do®4 “b 18114) 




- 36a? 


3! 


"b 9a 



Thus, it can be said that we know the distribution of the second moments of 
samples about a fixed point if the sampled population is of the Gram-Charlier 
type in the sense that given the number of terms necessary for an adequate 
representation and the number in the samples we can -write down the desired 
distribution. However, this is not a simple matter. Further, if some relation 
existed between the distributions of the second moments about a fixed point 
and the standard deviations of the samples we would know the latter distribution 
also. Such a relation is not apparent for samples of 2 and 3. 

Let us investigate the correlation surfaces of the means and standard devi- 
ations of samples of 2 and 3 drawn at random from a population represented 
by the first few terms of a Gram-Charlier series after the method of Dr. A. T. 
Craig.® The distributions of the standard deviations can then be obtained 
immediately by integration. 

Suppose that 

(^) fix) = ovpoisf) -b ampiix) -f ai<piix) 


• AnnaZa of Mathemattcal Statistics, Vol. 3, No. 2, May, 1932, pp. 126-140. 
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and that we are considering samples of 2. The probability of the concurrence 
of xi and xj is 

(5) 
and 

= —s + z 

( 6 ) 

Xa = s -j- X 

where s is the standard deviation and x is the mean of a sample of 2. By means 
of ( 6 ), (5) becomes 

( 7 ) + aoa3(— 6 s*x — 2 x® 6 x) 

+ aoa4(2s^ + 12sV - 128^ - 12x^ + 6) 

+ Ojf — 8® + 4- — 3s®x'‘ — 9s^ 9x'“ — 6x® -f x®) 

+ a3a4(2s® — 6s^x® — 6s^x + 6s*x® — 128®x® + ISs^'x — 2x^ 


+ 18x® - 42x® + 18x) 

+ 04 ( 8 * — 4s®x2 — 12 s® 4 - 4 - 12 s‘'x“ 4 - 42s^ — 4sV 

+ 12s*x^ - 368 =x^ - 36s2 + x® - 12x6 ^ - Sbx^ 4 - 9)]. 

To find the distribution of s we must integrate from — 00 to » with respect 
to X. Thus, ( 8 ) is obtained. 


( 8 ) 


Vw e al + aaa 4 ( 2 s* - 6 s^) 4 - a|^-s® 4- 


+ 2a; 


a^s* 4- al ^8® — 


148® + ^ s' 


105 4 105 


s’’ + 


105 


If we retain only two terms of (3), i.e. use 
(9) /(x) = Oo<po(x) 4 - a^piix) 

and consider samples of 3 we obtain as the correlation surface of x and s 

al - (_40x* + 24x8^ - 24x) 


1871 
— se 
\/3 




+ (-84s® 4 - 525x*s‘ - 2752x^8^ 

d 4 

(10) 4 - 576s^ - 1008x^8® - 288s® - 5586x® 4 - 270x' - 1728x’*) 

+ ^(28s‘ - 6189x*s® - 28x"s* - 629x® + 288s" 4 - 1344x" 

64 

4 - 4608x=s® - 288s® 4 - 729x®)"|. 
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The distribution of s can be obtained as before. The processes involved in 
obtaining (7) and (10) are so complicated that the general rule for writing the 
distribution of s is not apparent. Also, the relation of the distributions of s to 
the corresponding distributions of the second moments about a fixed point is 
not apparent. 

In summary, the general distributions of the second moments about a fixed 
point of samples from a population represented by a definite number of terms 
of a Gram-Charlier series and the distributions of the standard deviations of 
samples of 2 and 3 from the same type of population are given and compared. 
No apparent relation exists between them. 



ON THE FINITE DIFFERENCES OF A POLYNOMIAL 
By L H. Barkey 

In this paper an apparently new and convenient method of finding the suc- 
cessive finite differences of a polynomial is considered. If operationally 

+ ^ 1 ^ 2 ) = -B'’*’’* 4‘b^) = (1 + Ari)’'* (^(m) 
then for any polynomial /(*) of degree “n” 
f{x) = PO®" + Pi*""* + * * * + Pn 

= po(a: + a)” + ?ii(* + a)""! + • ■ - + 

Ei{z) = po(a: + «)" + Pi(^ + + • • • + Pn 

Aj(a:) = (pi - gn)ix + a)"-! + (p^ - qn){x + + . . . + 

Similarly, if /i(a;) = Aa/(a:), then 
fi{x) = (pi - ?n)(a: 2 a) -h qn{x -|- 2 a )’"2 + . . . 

Eii{x) = (pi - gii)(a: + 2 a)’'-i + (ps - qn){x + 2ay-^ + . . . 4 . (p, _ g,^) 

Aji(a:) = (P 2 - ei 2 - 222 ) (a: + 2a)^-^ 4 . . . . + (p„ _ 

and so on for the higher orders, since Aa/,-](a:) = A“/(a:). In the practikl 
application of this method, “a” may be conveniently taken as unity, and an 
abridged form of synthetic division employed. Thus, if 

f(x) = 5x^ 4 - 3a:^ 4 - 7x^ — 2x 4 - 3, then 


5 + 3-1- 7 - 2 

- 24 - 9-11 

- 7 + 16-27 
-12 + 28 

- 17 

+ 3 

+ 14 

20 - 21 + 25 

- 11 

-41 + 66 

- 77 

- 61 + 127 


- 81 


60 - 102 

+ 66 

- 162 

+ 228 

- 222 


120 

- 162 


- 282 


120 
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As is evident from the darkened numerals, all figures to the right of the dotted 
line are redrmdant and may be omitted. From the above, 

Afix) = 20(a: + 1)’ - 21(a: + 1)* + 25(a; + 1) _ H 

A^fix) = 60(a: + 2)=* - ]02(a: + 2) + 66 

AV(*) = 120 (a: + 3) - 162 

A* fix) = 120. 



SOME PRACTICAL INTERPOLATION FORMULAS 

Bt John L. Roberts 

Sometimes we wish to find by means of interpolation an approximation to a 
particular value of in the interval between the known values, wo and wx. 
But it also might be desirable in the interval from Wo to toi to interpolate sev^eral 
approximations to wj* at equidistant values of x. It is very important to know 
that a formula which might be very satisfactory to interpolate a particular value 
in an interval might seriously fail to be the most satisfactory formula when it 
is desired to interpolate several values in the same interval. The range of this 
paper is so limited that we only wish to find by means of interpolation several 
approximations to the true value of w* in the interval from Wo to Wi at equidistant 
values of x. 

One way to perform an interpolation of this sort is to use osculatory inter- 
polation.^ The real function of osculatory interpolation is to secure smooth- 
ness at the known points, which are sometimes called pivotal points. By 
roughness is meant that one or more of the successive derivatives are discon- 
tinuous at the pivotal points. Experience proves that the osculatory formulas 
usually secure smoothness either at the expense of labor or by a loss of accuracies 
over the entire range from wo to wi. Frequently the function of interpolation 
formulas is to save labor. In many cases it appears reasonable to save labor 
by a loss of both smoothness and accuracy. Formulas are herein selected, 
without direct regard for smoothness, so as to secure the best possible compro- 
mise between a maximum of accuracy and a minimum of labor. It appears 
that this results in many cases in a loss of smoothness that is no more objection- 
able than the loss in accuracy. 

The actuarial profession, while trying to perfect their methods of constructing 
mortality tables, have made contributions of a high order of scholarship to the 
theory of osculatory interpolation. But since the statistician, the astronomer, 
the physicist, and other scientists also have occasions to make interpolations, 
it seems to be very important to discuss the problem of finding the most prac- 
tical methods of interpolation, not only from the special viewpoint of the 
actuary, but also from the general viewpoint of mathematics. 

Awx is called the first difference of Wx, and may be defined by Aiv* = — «j,. 

* Since this paper presuppoaes certain knowledge on the part of the reader, it may be 
worth while to indicate some sources of this knowledge. The elementary parts of this 
knowledge can be found in any good book on finite differences. "Population Statistics 
and Their Compilation" by Hugh H. Wolfenden, published by the Actuarial Society of 
America, contains an excellent summary of osculatory interpolation. This summary 
indicates some valuable sources of information. 
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Second, third, and higher differences are merely successive differences of the 
first. When use is made of central difference interpolation formulas, it is 
convenient to adopt Woolhouse’s notation, which is defined by means of the 
following equations: Azo _2 f= a_ 2 , = a.i, Awo = ai, Awi = oa, 

A^w^i = bo, A^wo == bi, A®w)„ 2 = c_i, A®to-i = ci, A%_2 = do, A®in_2 = ei, A«w_5 = /j, 
etc. 

An important family of curves can be represented by 

W* = Mo + *01 + ^ *(* — 1)B + 

Assume wo = Wo and Auo = Awo. Then a study of (1) shows that ai, which 
has already been defined, must be a factor in the second term in order that (1) 
may be satisfied when a: = 1. (1) is a third degree equation. However, if 

C = 0, (1) becomes a second degree equation; if both 5 = 0 and C = 0, (1) 
becomes a first degree equation. In other words, by giving B and C proper 
values, (1) can be made to become many different interpolation formulas. 

For many purposes interpolation by a first degree formula is not sufiBciently 
accurate. We, therefore, might wish to interpolate by either a second or a 
third degree formula. Since it is possible to draw an unlimited number of 
second degree curves or third degree curves between the points Po and Pi, the 
problem of selecting the best second degree interpolatioh curve and the best 
third degree curve is of great practical importance. 

I 

Suppose that w_ 2 , «j_i, we, wi, w^, and wz can be found in a table of values 
of the function w*, and that we wish to find by means of interpolation several 
approJdmate values of w, in the interval from wo to Wz. These six given values 
of Wt can be used to determine six pivotal points, which determine a fifth degree 
curve. Suppose this curve represents the function Vx. Then and Vx would 
have exactly the same values at the six pivotal points, but would have values 
which are only approximately the same at other points. Using the first six 
terms of the Gauss central difference interpolation formula, we have 

Vx = Vo + xoi + - l)ho + i (* + !)*(* - l)ci 

+ (* + 1)*(* - 1)(* - 2)do 

+ ^ (a! + 2)(a; + l)a;(x - !)(» - 2)ei . 

It is proper to use in this formula the differences Oi, ho, etc., which have already 
been defined as differences of ta* because these differences are exactly equal to 
the corresponding differences of a*. Suppose Po, P^, Pf, and Pi are four points 


- 1 )(^ - ( 1 ) 
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which are determined by Vx. Thep. B and C can be determined so that (1) will 
represent the curve which can go through these four points. 

Then 


and 

Also 

and 

Since = v^ and wj = Vj, we have two equations, which can be solved for B 
and C. 

B = 5 - A and C = Cl - ^ei (2) 

where h and d are defined by 

6 = ^ (6o + hi) and d = ~ (clo + di) . 

A study of (1) shows that Mj does not depend upon C because the term con- 
taining C becomes zero when a: = ^ , and also shows that Ux over the entire range 
from uo to Ri is more sensitive to errors in B than errors in C. The B in (2) 
usually contains some error because the six terms of the Gauss formula which 
were used in determining B usually produce results which are only approximate. 

Consequently a comparatively large error in C would not produce an important 
error. 

Assume 


J5 = h_^dandC= Cl - Aej. (3) 

5 is the same in both (2) and (3), but C is not the same. The accuracy of 
^°curaoy of (3) do not differ by an important amount. On the 
0 her hand, if any attempt to apply (2) is compared with the working illustra- 
ions of (3) in this article, it will be found that (2) to an important extent is 
more laborious than (3). Therefore (3) is a better compromise between a 
maxunura of accuracy and a minimum of labor than (2). For this reason (2) 
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ought not to be regarded as a practical formula. On the other hand (2) because 
of its great accuracy serves as an ideal with which other formulas can be com- 
pared. In other words (2) is of theoretical importance. 

In like manner another interpolation formula can be found if we use the first 
four terms of the Gauss formula to determine Pj. Then 

,1 In 

Ui = Ua + -ai - -B 

and 

Uj = 110 + 2 ~ g (^0 + 2 • 

Since mj = wj, we can solve for B, and C is left arbitrary. If C = 0, we again 
get an excellent compromise between a maximum of accuracy and a minimum 
of labor. The following second degree formula results. 

B = h and 0—0. (4) 

In order that the value of (3) and (4) may be appreciated, they are herein 
compared with some other formulas which have been of historical importance. 

If the point Pj can first be accurately determined, a second degree curve 
through the points Po, Pj, and Pi would probably give more accurate results 
than such a curve through the points Po, Pi, and Pj because the first three 
points are in a smaller neighborhood; the second curve can be represented by 
the first three terms of the Gregory-Newton interpolation formula. The points 
P_i, Po, Pi, and Pa determine a third degree curve, which can be represented 
by the first four terms of the Gauss central difference formula. It is probable 
that these terms would determine Pj much more accurately than the first three 
terms of the Gregory-Newton formula because the latter is not a central differ- 
ence formula with respect to P* and because four terms usually give more 
accurate results than only three terms. Consequently there is a strong prob- 
ability that (4) is more accurate than the first three terms of the Gregory- 
Newton formula. In like marmer (4) is more accurate than the first three terms 
of the Gauss formqla. It is interesting to observe that (4) is the first three 
terms of the Newton-Bessel formula. 

li B = b and C = 3ci, 

then (1) is equivalent to Karup’s osculatory interpolation formula in terms of 
differences taken centrally. B is the same in both (4) and Karup’s formula. 
No interpolation formula can be very accurate unless C is about equal to Cj. 
Since, then, the error in C in Karup’s formula is about twice as great as the error 
in C in (4), his formula is distinctly less accurate than (4). Since (4) is a second 
degree curve and Karup’s formula is a third degree curve, his formula is very 
much more laborious. (4) is extremely accurate for a formula having its labor 
saving properties; for many purposes its roughness and inaccuracy appear to 
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be in about the right proportion. On the other hand Karup’s formula is ex- 
tremely inaccurate for a formula so laborious; its only good point is its smooth- 
ness. 

Changing somewhat the meanings of u and w, (3) may be written 


U®+n — "i" 


If 


then 




■f xa:(a: - 1) -= + A'w’n-i) 


54 


(AhVn-l -f A%; 


-1- g a:(a: - l)^a; - _ A . 


du 

dx 


u 


»+» f 


n' + o - wAi 




which is the amount of discontinuity in 3 - at Po. (3) has greater smoothness 

dx 

than (4) ; in other words (3) is more like an osculatory formula. On the other 
hand 


B = b — ^ d and C = Ci — 3 Ci , (5) 

o o 

which is equivalent to an important osculatory interpolation formula by Mr. 
Robert Henderson, compares much better with (3) from the viewpoint of labor 
saving and accuracy than Karup’s formula does with (4). 

II 

An excellent formula can be easily spoiled if the method of appl 3 dng it is not 
practical. Mr. Henderson, in the Transactions of the Actuarial Society of 
America, Vol. IX, applies (5) in such a way that the numerical work is very 
convenient. Some writers seem to have been very careless about this matter. 
A method intended to interpolate several values between m and Wi should 
provide that the end value wi shall be exactly reproduced if no error is made in 
the computation. In other words a good method should provide a check upon 
the work. At the same time, in order to avoid uimecessary labor, the work 
should not retain unnecessary decimal places or figures. In other words ficti- 
tious accuracy should be avoided. The following working illustrations are in- 
tended to show good methods of application of formulas and to show how much 
labor is necessary in order to apply them; also the size of the errors can be used 
to illustrate the theory. 
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When (4) is applied at either end of the table, where terms are not available 
for the calculation of the differences required, it should be assumed that the 
fourth differences that cannot be computed vanish and the required differences 
should be filled in consistently with that assumption. Aw* represents the first 
differences. But it is convenient to have S represent the first differences in 
such a manner that they are arranged centrally in the working illustration. 
in like manner represents the second differences. The 2 in /S*' means is a 
second difference, and does not have the familiar meaning used in algebra. In 
the case of (4), Am* = ai + xB, A®m* = B, and the higher differences all equal 
zero. Since we wish in the working illustration of (4) to interpolate four values 
between Wo and Wi, S and 6“ are defined by Su* = m*+ j — m* and S^m* = Sm *+,2 
—5m*, It is proved in any good book on finite differences that there are possi- 
bilities that A and 8, which are symbols of operation, can be separated from the 
functions upon which they operate, and they can be treated as if they were 
algebraic numbers. Consequently 1 + ^ = (1 + In other words by means 
of the binomial law 8m* = (.2A — .08A*)m*, where all the terms within the paren- 
thesis are to be considered as operating upon m*. Also 8®m* = .04A®u* ■ s, s*, and 
s* are defined by s = s* = 8m*, and = 8*Mi. Therefore the middle s == 

8m. 4 = .2ai, and s’* = .045 = .02(&o -f bi)- We are now in position to apply (4) 

to the case when m)* = (1.04)". It might prevent confusion if it is stated that 

X and n are related to each other in such a way that we always interpolate 

between Wo and wi. 


n 

( 1 . 04 )" 

a 

5 

a* 

80 

23.050 

.9218 

.845 


81 

23.9718 

.9603 



82 

24.9321 

.9988 

4.994 

.0386 

83 

25.9309 

1.0373 



84 

26.9682 

1.0758 



85 

28.044 

1.1190 

1.081 


86 

29 1630 

1.1670 



87 

30.3300 

1.2150 

6.075 

.0480 

88 

31.5450 




89 

32.8080 

m 



90 

34.119 

■■ 

1.317 


91 

35.4826 




92 

36 9036 


7.392 

.0574 

93 

38.3820 




94 

39.9178 




95 

41 511 


1.553 
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Some of the explanation of the application of (4) applies to (3) and does 
not need to be repeated. The method herein used of applying (3) is either the 
same as or a development of the Henderson method of applying (5). If it is 
desired to apply (3) at either end of the table, where terms are not available 
for the calculation of the differences required, it can be assumed that the sixth 
differences that can not be computed vanish and the required differences can 
be filled in consistently with that assumption. A study of the theory under- 
lying this assumption shows that it does not result in a true central difference 
formula and that it consequently results usually in some loss of accuracy. In 
the case of (3) before the finding of the differences of (1), it is convenient to 
write it as follows; 


Wi = Wo + xai — 1)^B 4 . 1 (7^ 4- ^ a: (a: - l)(a; - 2)C. 

Then 

Aw, = ai + x{b 4 . ^ (7^ d- 1 a; (a: - 1)C , 

6?Ux = ^5 -t- i -b xC, and A^w, = C . 

Suppose we wish to interpolate four values between w'o and wii. 8 and 5* 
have already been defined. S’w, = — 6 ’'w,. Then 1 -j- 8 = (1 -f A)*, 

or iux = (.2A — .08A^ + .048AOw,. Also 8 ^, = (.04A^ — .032A®)w, and = 
.008A® • s®*, si I and s® are defined by s® = = 8 ®u,_ and s® = A ’,4 = 8 ®u,. The 

first 

6. -A*). 

The last 

s® = S®w ,8 = .04^5 +lc^ = •04(&i - 

.1852 might be a useful approximation to The remaining s®, s should be 

filled in so that they are in arithmetical progression with irregularities at the 
ends. If the irregularities can be distributed equally at both ends, the irregu- 
larities cause an error in C, but none in B. Errors in B are more important 
than those in C. The middle s = 8 w 4 = . 2 oi - s*. In the following working 
illustration, w), = sin n. 


= . 04^5 ~ 1 
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V 

sm n 

5 


S’ 

S* 

-60 

-.86603 

.36603 




-30 

-.50000 

.50000 

.13397 

- . 13397 


0 

i 

.00000 

.50000 

.00000 

-.13397 

00000 

30 

50000 

.36603 

- 13397 

- .09809 

.03588 

60 

86603 

. 13397 

-.23206 



90 

1.00000 






n 

sin n 

s 


s> 

0 

,00000 

.104498 

.000000 


6 

104498 

.103374 

-.001124 


12 

,207872 

.101125 

2249 

-.001125 

18 

.308997 

.097751 

3374 


24 

.406748 

93252 

4499 


30 

.50000 


-.005624 



Suppose we wish to interpolate nine values between m>o and Wi by the use of 
(3). Then 5 m* = m*+,i — m*, = 5m*+,i — 5m*, and 5 ®m* = 5®m*+,i — 

Consequently 1 + 5 = (1 + or 5m* = (.lA — .045A^ + 0285 A*)m*. Then 
6'm* = (.01A“ - .009A«)u* and 5’m* = .OOIA^ = s‘ => 5%*_.i and s* = = 

5*m*. The first 

= 5“m_i = .Ol^B -l^) ^ 

The last 

^ = 8 %., = .0i(b + ~C^ = -Ol^bi - ^ rfi) • 

8m. 4 = (.loi — 4s*) — and 5m,s = (.Itti — 48*) + ^5*m.4. 

^ Ji 
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n 

8in n 

s 



0 

.00000 

52318 

000000 


3 

.052318 

52179 

-.000139 


6 

. 104497 

51899 

280 


9 

. 156396 

51478 

421 


12 

207874 

.050916 

562 

- 000141 

15 

.258790 

.050212 

703 


18 

.309002 

49368 

844 


21 

358370 

48383 

985 


24 

.406753 

47257 

1126 


27 

.454010 

45990 

1267 


30 

.50000 

i 

-.001406 



Suppose we wish to interpolate five values between Wo and Wi. The first 

^ ^ - I *) • 

6U} = i (ai — 88U:,) — ^ 
and 

^ (ai — 85»u») + i S^Mj . 

In the following working illustration the given values of sin n are written cor- 
rect to five decimal places: in other words after each decimal point there are 
five symbols or digits representmg numbers ; also each of these symbols is written 
in the scale of ten. It can be observed that some values of Ux, s, s^, and s® in 
the working illustration have six sjrmbols to the right of the decimal point, and 
that some values have seven symbols to the right of the decimal point. In all 
cases the sixth symbol to the right of the decimal point is written in the scale 
of ten, and the seventh symbol is written in the scale of six. This procedure 
provides a check by exactly reproducing wi. Also this procedure does not cause 
much fictitious accuracy, and can be quickly used after a little practice. 


n 

sin n 

S 

s* 

a* 

0 

00000 

87130 

.000000 


5 

.0871305 

86479 

-.000651 


10 

. 1736104 

.0851775 

1302 


15 

.2587883 

.0832245 

1953 

-.000651 

20 

.3420132 

80620 

2604 


25 

.4226341 

77365 

3255 


30 

.500.00 


-.003906 
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In general if we wish to interpolate i ~ 1 values between Wo and Wi when i 
is neither five nor ten, Wi can be exactly reproduced if some of the symbols are 
written in the scale of i If i = 12, it is evident that we need two extra symbols, 
say t and e, to stand for ten and eleven respectively. If we wish to interpolate 
i ~ 1 values between Wo and Wi by the use of (4), in the computation each of 
Ux, 8 and except the given values should contain one more symbol than each 
given value contains, and the extra symbol should be written in the scale of i 


ERRATA 

The Annals of Mathematical Statistics 

Volume VI, No. 3, September, 19S5 

The eleventh line on page 137 should read 

' ' 1 j . 5 , 

Uj 4- 0 ~ Ui _ j - ^ do + fo. 

In the sixth line from bottom of page 139, read s*’s, i.e. the plural of s“. 
About the middle of page 141 the formula Suj should read 

1 , . , 1 



ON EVALUATING A COEFFICIENT OP PARTIAL CORRELATION 

By Geace Streckbe 


It is to be shown here that when the multiple correlation coefficient E„, 12 . . . („_i) 
is found by the method of Horst^ the partial correlation coefficient ISn(n-i); u ■ . . («- 2 ) 
can be found in terms of the |3’s. If we are interested only in evaluating a 
partial correlation between two variables, we may also employ the method which 
will be given here. 

Without loss of generality the dependent variables may be chosen to be the 
nth and (n — l)st. The coefficient of partial correlation as given by Rietz* 
may be expressed in the following form: 


( 1 ) 


Rn(n-Vi', 12 ■ • . (n-2) 



E(» — 1) (rt— l)nn ^nn 

Rjn-l) (n-1) 
Rin—1) (n— l)n» 


i2(n-i)(n-i5 may be treated as a new determinant R'. Regarding its elements 
as the coefficients of a set of normal equations (n — 1 in all) whose constant 
terms are zero, we may follow through the Doolittle elimination process. For 
the case where n = 4 we have the table given below. 

In comparing this outline with the one illustrating the Doolittle elimination 
process for R when n = 4 we see that 

f -A-ii 

Yu = 7u = 


Therefore, we have 


Y 22 = 722 = 


R^A^^’ 


3 3 

733 = *^33 — 21 Pi3 = ~ 2 ' 

2 2 


R' = 


Au rAim 

^ ■ R^Aii 




^ Horst Paul, A Short Method for Solving for a Coefficient of Multiple Correlation, An- 
nals of Mathematical Statistics, Voi. Ill, No. 1, Feb. 1932, pp. 40-44. 

“Rietz, H. L., Mathematical Statistics, p. 101. 
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Reciprocal 

1 

2 

3 

a 


7 

5 

-Au 




/ 


/ 

7i 



H 






s' 


— A 


An 




5i 



■422 

A 24 

1 

/ 

^2 

! 

1 




Al, 

R^Axi 

A 12 A 14 

R*An 


j ^22 

1 


1 

! 

! 

RUii 


■4.4 1122 

AAn24 



1 r 

1 72 

! 

Aj4n22 


RMii 

R^Aii 



! 


1 


-1 

A 1124 

A 1122 



1 






f 

^3 







R' 








AL 

R^Aii 


)323 






1124 

I2*AiiAii22 


^33 






- 1 

2 



/ 

73 





-1 




«3 


Tu = Til , 
/ 

72 2 = 72 2 ) 


7(rt“2)(fi~2) 7(n — 2)(7 i— 2)) 


7(n^l)((f— 1) — «niv 



In the general case 
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Hence 

«-2 / n-l \ 

■^(n— l)(n— 1) •— R — XI T*t ( ®nn — ^tn J • 

n n— 2 

Since jffi = n 7u, then. i2(„_i)(„-i)„„ = H 7 „, from which we see that 


n— 2 / »— I 

p 11 %i (®nn S 

n(n-l)(n-ll 1 \ 2 


n-l 


■Kin— 1) (n— l)nn 

But since ««« = 1, then 


n-2 

Ry. 

1 


— Oitm — ^ 


R(n—l)(ii—: 
if (n-l) (n-l) 


n— I 




It has been shown that 


Substituting the above values for 




R(n i)(n-i) 


if(n. 


"l)(n— l)nn 


^2n(n-l).12*‘ (rt-2: 


l/ 


1- Si3.n-(i-i: 
2 


n-l 


1 - r ^‘1 


or 


•Rn(«-l)!12...(n-2) = 


n— 1 

1 - L 


Hence it is seen that when the |3's given by Horst (page 42) are calculated, 
it is an easy matter to Solve for the partial correlation Jf «{«-!), i 2 - • • (n- 2 ) • 


St. Louis Univehsitt. 



A THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 

AND CHECK LISTS^ 

By Lee Byrne 

Visiting Professor of Secondary Education, New York University 

Part I. Research Products Which May Be Classified as Derivative 
Specifications and Check Lists 

Meaning of Specification 

In specification something is assigned a specific character. The something 
to be thus assigned a specific character may be called the specificandum. The 
specific character assigned to the specificandum, or (as a second meaning) the 
act of so doing, may be called the specification. 

A proposition is the smallest unit in which it is possible to embody a complete 
thought and is ordinarily represented by a single sentence. In specification 
the characterization may be confined to a single proposition or it may be ex- 
tended to include an indefinitely large number of propositions. So a speci- 
fication may be embodied in a sentence, a paragraph, a chapter, or a whole book, 
No matter how far it is extended it will never give complete determination, as 
our knowledge cannot be made exhaustive or our control be given an absolute 
precision. 

In view of the meaning assigned to specification it is evident that very many 
books and monographs could in this sense be classified as specifications. 

Meaning of Derivative Specification 

There is a type of specification (book or monograph) which is developed by 
deriving it from a group or class of specifications which already exist. This 
class may be a total claSs of all such specifications, or a group of those accepted 
as authoritative, or a group of those taken to be representative. A specification 
derived in this manner may be called a derivative specification. As an example 
we could take almost any first-class work by a present-day historian; by his- 
torians it would be called “secondary” because it is based on study of pre- 
existent documents called “primary sources.” 

Meaning of Check List 

The act of deriving a product from a pre-existent set of documents may, as 
we have seen, take the form of a derivative specification, embracing an as- 

‘ This paper is an amplification of a report made in the statistical section of the Ameri- 
can Educational Research Association at its meeting in February, 1931. 
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semblage of determinates or determinations. On the other hand the product 
derived may be intended merely to indicate the ground covered or to be covered 
by determination, without actually selecting the particular determinations. 
Such a product will be called a check list. The term is not a very happy one, 
but it is in very common use. If we think of a specification as an assemblage 
of determinations then a check list could be thought of as a corresponding set 
of determinables.® Since any determinable is capable of an indefinite number 
of determinations it is evident that a long check list could give rise to an ex- 
tremely large number of different specifications, of which, of course, some frac- 
tion might prove undesirable, inadmissible, or false. 

Modes of Specification: How We Specify 

If we examine any specification to see how the specifying is done we shall 
find that it ultimately takes the form of specification under aspects. The fol- 
lowing diagram indicates the principal (perhaps all the) possibilities in the way 
of specification. 

Naming the original or main specificandum 
Naming an aspect 

Characterization of the specificandum under the aspect named 


Naming a relation (includes process, operation etc.) 
Naming an aspect of the relation 
Characterization of the relation under aspect named 


Naming a relatum or thing related (a new specificandum) 
Naming an aspect of the relatum 

Characterization of relatum under aspect named 


Naming a part 

Naming an aspect of the part 

Characterization of the part under aspect named 


(The naming of aspects may be merely implicit but it is always present in 
principle.) 

• On the notion of the “determinable,” which ia due to W. E, Johnaon, see his Logic, 
Cambridge University Press (1921), Part I, p. xxxv and Chapter XI. 
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Thus it appears that if specification is pressed far enough it always ultimately 
becomes specification under aspects. Aspect and determinable may be re- 
garded as synonyms. 

Current Examples of Derivative Specifications and Check Lists 

At the present time it will be found that we have very many products of 
research which take forms capable of being classified as some kind of derivative 
specification or (derivative) check list in the senses in which these expressions 
have been explained. 

I have distinguished more than twenty different logical types of derivative 
specification or check list which are exemplified in the current literature of 
educational research and related subjects. However space will not permit 
exhibition of examples of these different types. 

Past II. Validation of Debivative Specifications and Check Lists 

Many research products may be classified as derivative specifications or check 
lists, derivative m the sense that they have been derived from a group of docu- 
ments (books, articles, journals, newspapers, courses of study, etc.) through 
analysis of their content. Such source documents themselves we shall call 
specifications or groups of specifications. 

The only validation problem raised here is the question whether the resulting 
check list or derivative specification truly represents the class of source specifi- 
cations used. The further question whether the class of source specifications 
itself constitutes a satisfactory source is not discussed. 

From this point of view, if a check list or derivative specification is based in 
some suitable manner on all tbe documents of the class represented, no real 
validation problem arises ; the vahdity has to be regarded as perfect. 

It may often happen that the investigator does not wish to analyse all of 
the specifications of the class in question but prefers to save time and labor by 
confining his analysis to a select group drawn from the total class as a sample. 
In this case the problem arises as to how far results based on such sample should 
be judged to be truly representative of the entire class of specifications (most 
of which have not been analysed). A problem of this nature may be called the 
problem of vahdity for this kind of work. 

Such a validation problem appears to take the same form whether the product 
to be validated is a derivative specification or (derivative) check list. Accord- 
ingly we shall for the sake of brevity carry on the discussion by referring to the 
problem as that of validating (derivative) check lists. The same principles 
would apply if the product happened to be a derivative specification. 

In order to consider the vahdity of a check list based on a sample group of 
specifications (called here a Sample Check List)* we may hypothesize a check 
list based in the same manner on the entire class of specifications from which 
the sample was drawn. Such a hypothetical check list (which is not made) 
will be called the Ideal Check List. Then the problem of validity may be con- 
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ceived as the question as to how far the content of the Sample Check List agrees 
with the unknown content of the Ideal Check List. 

An overlapping of the two appears ordinarily to be certain but a failure of 
complete coincidence is very highly probable. The question is what degree of 
coincidence is to be expected. 

This general validity problem naturally divides into two separate questions. 
The first question asks what proportion of the content of the Sample Check 
List may be expected to be present also in the Ideal Check List; this may be 
called the (sub-) problem of reliability. The second question asks what propor- 
tion of the content of the Ideal Check List may be expected to be present in the 
Sample Check List; this may be called the (sub-) problem of completeness. 
The answers to these two problems, if expressed in numerical percentages, could 
be called the Index of Reliability and Index of Completeness respectively. 

We shall first consider these two problems in their simplest form and after- 
ward in a more complex form in which they exhibited themselves in a recent 
study by the writer.® The simple case presents no great difficulty and it is 
possible that a different method of disposmg of it might be preferred. The more 
complex case, however, appears to be rather difficult of solution and the writer 
has not been able to find in the literature any developed technique for handling 
it. The simple case is presented here primarily because it affords, by further 
extension, a successful approach to the difficult problem of the more com- 
plex case. 


Simple Case 

Terms and Symbols 

The “class of specifications” will be understood to consist of all specifications 
which belong to the whole class of specifications regarded as a source, a class 
which we claim to represent in our final product. In this problem the “class” 
will not be regarded as indefinitely large but as consisting of a definite number 
of specifications, a number to be ascertained by actual count or by careful 
estimate. 

“Sample specifications” are the limited group selected from the class for 
purposes of actual analysis, and which play the rdle of representing the whole 
class. The remaining specifications of the class are not analyzed. 

“Sample Check List Material” is a name for the assemblage of all the different 
items found in one or more sample specifications. 

“Ideal Check List Material” is a name for a hypothetical assemblage of aU 
the different items found in one or more specifications in the class. Only those 
appearing in some sample specifications can be actually known, the rest are 
hypothetical. 


’ Byme, L. Check List Materials for Public School Building Specifications. Teachers 
College, Columbia University. 1931. 
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Write 


M (constant) = total number of specifications in class 
W (variable) = number of these specifications in which a particular item 
under consideration appears (this number is hypothetical and some 
of the particular items themselves are hypothetical) 
m (constant) = number of sample specifications 
n (variable) = number of sample specifications in which a particular 
(the same) item appears 


Values of n may be expected to vary for different items, from m to 0 by inter- 
vals of 1, the zero value appertaining to any item wholly absent from the Sample 
Check List Material (hypothetically present in Ideal Check List Material). 

Values of N might be expected to vary, for different items, from jif to 1 by 
intervals of 1. But in this problem the convention will be adopted that the 

M 

range is from M downward by intervals of — . Thus if the number M should 

m 

be five times as large as the number m then the range for N would be treated 
as proceeding from M downward by intervals of 5: M, M — 5, M — 10, • • *5. 

A “tabulation” will mean a statistical table showing how many different 
items appear in every possible number of specifications. A tabulation must be 
made by actual count for the items of the sample specifications, and will show 
the number of items having each possible value of n. A similar tabulation is 
hypothetical for the items in all the specifications of the class, that is for the 
number of items haviig each value of N permitted by the convention of the 
last paragraph. 

“Tabulation cell” (or simply “cell”) will mean, as needed, either the number 
of items or the group of items appearing in any designated number of specifi- 
cations. For Sample Check List Material it will be the number or group of 
items to which a particular value of n appertains; for Ideal Check List similarly 
the number of items or group of items to which a particular value of N appertains 
(hypothetically). 

“Sample Check List” will mean a list of items selected from the Sample 
Check List Material according to some adopted criterion. For illustrative 
purposes we shall consider this criterion to be, for example, the numerical 

ratio w ^ . 

A 

“Ideal Check List” will ‘mean a list of items selected from the Ideal Check 
List Material according to some adopted criterion. For illustrative purposes 


we shall consider this criterion to be the numerical ratio N 




M 

~ 2 ' 


Problem of Reliability 

The problem of reliability may be restated and renamed the General Reli- 
ability Problem. This may be broken up into a group of problems which will 
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be called Elementary Reliability Problems. Each of the latter may be in turn 
broken up into a group of problems which will be called Ultimate Reliability 
Problems. Each Ultimate Reliability Problem may be solved directly. Com- 
bination of these solutions will yield solutions of the Elementary Reliability 
Problems. Combinations of the latter solutions will finally yield the solution 
of the General Reliability Problem. 

These problems will now be stated 

General Reliability Problem: What proportion of the items present in Sample 
Check List may be expected to be present also in Ideal Check List? 

Elementary Reliability Problem: What proportion of the items in a particular 
cell in Sample Check List may be expected to be present also in Ideal Check 
List? 

Ultimate Reliability Problem: What proportion of the items in a particular 
cell in Sample Check List may be expected to be present also in some designated 
cell in Ideal Check List? 

To solve an Ultimate Problem : 

From the Fundamental Theorem in the Theory of Inductive Probability 
(Whittaker, E. T. and Robinson, G. The Calculus of Observations. London; 
Blackie & Son. 1924. p. 305) the solution may be expressed as 

Pr-P, 

UPp ‘ 

Whittaker and Robinson’s statement of the Fundamental Theorem in the 
Theory of Inductive Probability is as follows (form slightly changed without 
change in meanmg) : 

“Suppose that a certain observed phenomenon may be accounted for by any 
one of a certain number of hypotheses, of which one, and not more than one, 
must be true: suppose moreover that the probability of the 2?-th hypothesis, 
as based on information in our possession before the phenomenon is observed, 
is Ps, while the probability of the observed phenomenon, on the assumption of 
the truth of the P-th hypothesis, is p,. Then when the observation of the 
phenomenon is taken into consideration, the probability of the P-th hypothesis is 

UPp 

where the symbol S denotes the summation over all the hypotheses."^ 

It is clear that an Ultimate Reliability Problem is a case falling under this 
Fundamental Theorem. The observed phenomenon is any item occurring in 
any specified cell of Sample Check List, say cell n — s. It may be accormted 
for by a certain number of hypotheses as to its source in the Ideal Check List 


* Por the fundamental position of this theorem in a theory of science and for its proof 
one may also consult Jeffreys, H. Scientific Inference. Cambridge : Cambridge University 
Press. 1931. Chapter II (section 2,34). 
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Material; the different cells in the I^eal Check List Material are these different 
hypotheses of origin, hypothetical because we do not hnow from which one it 
has come but only that it must have come from some one of them; the cell from 
which it actually comes is the true hypothesis, though we do not know which 
one that is. That the origin of the item is in cell iV = J? is the R-th hypothesis, 
and its probability is written Pr. The probability of the occurrence of the 
phenomenon on the assumption of the truth of the i2-th hypothesis is the prob- 
ability that an item in cell N == R will appear in Sample Check List in cell n — s 
and its probability is written p,. As we dearly have in our Ultimate Reliability 
Problem a case falling under the Fundamental Theorem quoted we may accept 
as the required solution of the Ultimate Reliability Problem the formula already 
given in the initial statement: 

UPp ' 


This expresses the probability that any item found in Sample-Check-List cell 
n = s comes from (and appears in) Ideal-Check-List-Material cell N — R, ot 
it gives the proportion of items found in Sample-Check-List cell n = s that 
may be expected to come from (or appear in) Ideal-Check-List-Material cell 
N=R. 

Meaning of any value of P (say Pr) = the probability that any item, drawn 
at random from those cells of Ideal Check List Material which are possible 
sources of items in Sample-Check-List cell n = s, will happen to be drawn from 
cell N = R. 

Meaning of any value of p (say p,) = the probability that any item in Ideal- 
Check-List cell N = R will also be present in Sample-Check-List cell n = s. 
(Important: this supposition is not equivalent to its converse.) 

Evaluation of Pr : 


Pk 


number of items in cell N = B 

number of items in all cells which are possible sources of items in cell n = s' 


For this ratio it is necessary to assume that the shape of the numerical curve 
formed by the group of Ideal-Check-List-Material cells is the same as that of 
the numerical curve formed by the group of Sample-Check-List-Material cells. 
On this assumption we may replace the numerator by the number of items in 
the Sample-Check-List-Material cell havmg an abscissa corresponding to that 
of the Ideal-Check-List-Material cell N = R, and replace the denominator by 
the sum of the numbers of items in all the cells with abscissae corresponding to 
those of Ideal-Check-List-Material cells which are possible sources of items in 
cell n = s. 

Evaluation of p,: 

By the aid of “the definition of probability which is used in practically all 
treatises on the subject” (Coolidge, J. L. An Introduction to Mathematical 
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Probability. Oxford: Oxford University Press. 1925. p. 4) and the principle 
underlying the Theory of Combinations (Whitworth, W. A. Choice and Chance. 
New York: G. E. Stechert & Co. 1927. Proposition II) we are able to arrive 
at the evaluation: 


in which, for any p (say p,), we employ for N the value N = R, and for n the 
value w = s. As the denominator later cancels out it may be disregarded 
throughout, simplifying the formula to 

P — n^n* 

(A symbol such as Cn is read “the number of combinations of N things taken 
n at a time” ; also written in several other forms.) 

The definition referred to may be worded as follows (Coolidge’s own preferred 
definition is not quite the same) : 

“An event can happen in a certain number of ways, which are all equally 
likely. A certain proportion of these are classed as favorable. The ratio of 
the number of favorable ways to the total number is called the probability that 
the event will turn out favorably.” 

The principle underlying the Theory of Combinations may be quoted from 
Whitworth as follows (also found in ordinary works on algebra) : 

“If one operation can be performed in m ways, and then a second can be per- 
formed in n ways, and then a third in r ways, (and so on), the number of ways 
of performing all the operations will be m X n X r X etc.” 

If it is not at once clear that the formula for evaluation of p follows from the 
definition and principle just quoted, the following considerations should make 
it evident. 

We are working in terms of a particular item belonging to a particular Ideal- 
Check-List-Material cell, say cell N = R. “Favorable” occurrence requires 
that this item fall in a particular Sample-Check-List cell, say n = s, while 
falling in any other Sample-Check-List-Material cell (including ceU n = 0 for 
absence) is “unfavorable.” Again the real meaning of the “favorable” occur- 
rence is that the item will be found in jjist n = s out of the m specifications of 
the sample, and absent in the remaining m — n specifications of the sample. 
Moreover presence in Ideal-Check-List-Material cell N = R means that the 
item occurs in just N = R of the M specifications that constitute the whole 
class and is absent in M — JV of these specifications. The total number of all 
the ways (favorable and unfavorable) in which our event can happen means the 
same as the total number of all the ways in which a group of m specifications 
can be selected from a larger group of M, and this is, of course, written and 
given us in our denominator. The number of favorable ways in which our 
event can happen means the same as the number of ways in which N specifi- 
cations containing the item can form groups of n specifications while at the 
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same time M — N specifications not containing the item can form groups of 
m — n specifications; the first distribution can be done in ways and the 
second in ways, so by Whitworth’s principle the number of ways which 
these things can happen simultaneously is C^Z^ C^- Assembling numerator 
and denominator we have the formula initially stated for evaluation of p, viz. : 

m~n n 

P = • 


This is the general formula ; in applying to the particular example N = R,n = s 
the replacements for N and n, of course, give 


V> = 


flU~R riR 


Having a means of evaluatmg P and p we may solve all needed Ultimate 
Problems. The resulting solutions of the needed Ultimate Reliability Problems 
(not necessarily completed) enables us to arrive at the solution of any needed 
Elementary Reliability Problem in the form of a percentage which may be 
called an Index of Reliability for the Sample-Check-List cell in question. In 
computmg this percentage we distinguish source-cells that belong to the Ideal 
Check List from other source-cells that belong to the Ideal Check List Material 
but not to the Ideal Check List. 

By properly averaging cell-indices of Reliability (which are really Indices of 
Reliability for the individual items in the cells) we may obtain a solution of the 
General Problem of Reliability in the form of an Average Index of Reliability 
for the Sample Check List as a whole. 

In addition to the Average Index of Reliability for the Sample Check List 
we may easily secure also Average Indices of Reliability for any series of briefer 
Sample Check Lists selected from the Sample Check List, by properly averaging 
the Indices of cells contained in any Sample Check List in question, keeping 
the original criterion for Ideal Check List. 

In practice it may not be necessary to compute all cell- Indices, as a portion 
of these may be entered in tables by any methods of interpolation regarded as 
acceptable. 

Problem of Completeness 

Again we have General, Elementary, and Ultimate Problems. These may 
be stated as follows : 

General Completeness Problem: What proportion of the items present in 
Ideal Check List may be expected to be present also in Sample Check List? 

Elementary Completeness Problem : What proportion of the items present in 
Ideal Check List may be expected to be present also in some designated cell in 
Sample Check List? 

Ultimate Completeness Problem : "Vv'hat proportion of the items in a particu- 
lar cell in Ideal Check List may be expected to be present also in some designated 
cell in Sample Check List? 
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To solve an Ultimate Problem : 

From principles already used the proportion to be expected is the same as the 
value of p alone in an Ultimate Reliability Problem, viz. : 

C M~N /nfJV 
m—n ^ n 
rfM 

By the use of this formula we may solve the Ultimate Problems for all values 
of N represented in Ideal Check List and all values of n represented in Sample 
Check List; some of these solutions will have a value of zero. 

For each value of n, if we properly average the solutions of the Ultimate 
Problems, we obtain a solution of the Elementary Problem for one Sample- 
Check-List cell in the form of a percentage which may be called the Index of 
Completeness for the particular Sample-Check-List cell. In securing this 
average it is necessary to multiply each Ultimate Problem solution by a relative 
number corresponding to the assumed ratio of number of items in the particular 
Ideal-Check-List cell to the number of items in all the Ideal-Check-List cells. 
The source of the assumed relative numbers is the same as that used in evaluat- 
ing P in the Reliability Problem. 

When we have an Index of Completeness for each Sample-Check-List cell 
we may obtain a Total Index of Completeness for the Sample Check List as a 
whole by summing the cell-indices of Completeness of all the cells of the Sample 
Check List. By an equivalent but preferable method we may divide the last- 
named result by the sum of the cell-indices of Completeness of all the cells of 
the Sample Check List Material (including cell n = 0); by this method the 
C" of the original formula cancels out and so may be disregarded throughout. 

A Total Index of Completeness is similarly obtainable for a Sample Check 
List (any Sample Check List selected from the Sample Check List) by summing 
the cell-indices of Completeness of the appropriate cells. Thus, if desired, a 
tabulation may be made showing Indices of Completeness for a series of Sample 
Check Lists differing in extent. 

A combined tabulation may show for each of a series of Sample Check Lists 
its Index of Reliability and its Index of Completeness. 

More Complex Case 

So far we have considered a validation problem of simple type. In the writer's 
Check List Materials for Public School Building Specifications'* a more complex 
problem was presented, due to the introduction of the concept of the Applicable 
Case. A Check List for School Building Specifications was developed with a 
view to its use by school officials or others as an aid in judging proposed school 
building specifications with reference to their completeness or incompleteness 
of determination. The position was taken that a new specification ought not 
to be charged with the omission of a given item unless the building (as repre- 

“ Byrne, L. Check List Materials for Public School Building Specifications. Teachers 
College, Columbia University. 1931. 
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sented by the specification) had an Applicable Case for that item. To give a 
single example, the Check List contains various items relating to the specifying 
of marble work. It did not seem appropriate to score a specification down for 
the omission of numerous determinations in marble work, if in fact there was no 
marble in the building to be determined. This situation is expressed by saying 
that there are no Applicable Cases for those items. 

It seems likely that there are other research problems in which the question 
ought to be raised whether adequate treatment does not require the introduction 
of the concept of the Applicable Case. If so a more difficult validation problem 
is presented than would otherwise be the case. 

In the more complex case indicated solution is obtained by making the neces- 
sary extensions in the procedures followed for the simple case. 


Modifications in Terms and Symbols 


M (constant) = total number of specifications in class 

D (variable) = number of these specifications containing an Applicable Case 
for a particular item 

N (variable) = number of the latter specifications which also contain the 
particular item 

m (constant) = number of specifications in sample 

d (variable) = number of these specifications containing an Applicable Case 
for the particular item 

n (variable) = number of the latter specifications which also contain the 
particular item 


Values of d range from m to 0 by intervals of 1, and those of n range from d 
to 0 by intervals of 1. 

The convention is adopted that values of D range from M downward, and 


M 

those of iV from D downward, by intervals of — . 

m 

(Tabulation) cell will mean the number of items (or the group of items) having 
a common value of d and a common value of n. 

The criterion for membership in the Sample Check List may, for illustrative 


purposes, be taken as n 


> 


d 

2 ' 


The criterion for membership in the Ideal Check List may, for illustrative 
purposes, be taken as V ^ . 


Problem of Reliability 

Following the same principle and line of reasoning as for the simple case we 
arrive at the same general formula for the solution of an Ultimate Reliability 
Problem, viz. : 


Pb-P, 
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Meanings of values of P and p are the same as before except that cells must be 
described respectively in terms of n and d values instead of n values alone, or 
JV and D values instead of N values alone. 

Fn is evaluated in the same maimer as before, using the new meaning of 
"cell." 

For f, the evaluation now becomes 

pM-J> pD-N fiN 
^ riM 

which through cancellation may be simplified to the working formula 

The reasoning leading to the denominator C" is unchanged and so this de- 
nominator itself remains unchanged. The numerator for the evaluation of p is 
altered to the extent shown by the consideration that, in producing “favorable” 
ways, we now have to do with the number of simultaneous possibilities of draw- 
ing n specifications from a group of N specifications containing a particular item, 
drawing d ~ n specifications from a group of D — N specifications which con- 
tain an Applicable Case for this particular item but do not contain this item 
itself, and of drawing m — d specifications from a group of Af — D specifications 
which contain no Applicable Case for the item. 

Problem of Completeness 

Following the same prmciples and line of reasoning as for the simple case we 
arrive at the following formula for the solution of an Ultimate Completeness 
Problem; 

^M—D riN 

Lm— d ^d-n L-n 

C" 

By suitable treatment bringing about cancellations the working formula may 
be reduced to 

flM—D flD—N nH 
L'm-d 

Techniques and Aids in Computation 

The present paper is limited to an attempt to explain with adequate fullness 
the proposed theory of validation for derivative specifications and check lists, 
and space is lacking in which to exhibit techniques of actual computation. One 
specimen problem worked out in fairly complete detail, together with remarks 
on available aids in computation will be found in Appendix A3 in typewritten 
copies of the writer's “Check List Materials for Public School Building Specifi- 
cations" on file in the Library of Teachers College, Columbia University; the 
Appendices are not included in the printed edition. 



A NOTE ON SHEPPARD’S CORRECTIONS 

By Solomon Kxjllback 

In this note we shall derive a simple relation between the characteristic 
function of the grouped distribution and the characteristic function of the 
original continuous distribution, assuming that the frequency curve has high 
contact with the x>axis at both ends. 

If we set ps = / f{x) dz, then the characteristic function of the 

grouped distribution is given by 

( 1 ) i'i.t) = Yi 

where i = V— !• Replacing ps by its value as given above, we have 

fx -f " 

( 2 ) = Yj J 



f{x -f Zi) dx 


- f„dx'^e'‘^’f(x + z,) 



There is no difficulty about justifying the inversion of the order of integration 
and summation. 

Because of the assumption of high-contact with the axis of x at both ends of 
the frequency curve, we have 

(3) ipit) = / e'‘^f(x) dx =:wY 

so that 
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This is the desired result, from which there follows the desired moment 
relations by equating coefficients of (it)' on both sides of the equation. For 
example: 


1 + M.xt + ~ mr + ^ + 



(ay 1 

4 3! 


(ity <w^ 1 
16 5! 


( 


1 + mdt + ~ (ity 4 - 


■) 


= 14 - + 


2 ! 


/ toA (xty / mx wA 

!“■ + 15 j -1 ^ r- + T- j 


+ • 


or 


Mx = xnx ; 




Mi = m.i -y ~ 


Washington, D. C. 


Ms = ms 4- 


nii 



THE LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS^ 

By J. L. Doob 


There have been many advances in the theory of probability in recent years, 
especially relating to its mathematical basis. Unfortunately, there appears to 
be no source readily available to the ordinary American statistician which 
sketches these results and shows their application to statistics. It is the purpose 
of this paper to define the basic concepts and state the basic theorems of prob- 
ability, and then, as an application, to find the limiting distributions for large 
samples of a large class of statistics. One of these statistics is the tetrad differ- 
ence, which has been of much concern to psychologists. 

I 

Let F{x) be a monotone non-decreasing function, continuous on the left, 
defined at every point of the a:-axis, and satisfying the conditions 

(1) lim Fix) = 0 , lim Fix) = 1 . 

a — +—00 II — +60 

Then the function Fix) is said to be the distribution function of a chance variable 
X, and Fix) is said to be the probability that x < x. The curve y = Fix) is 
sometimes called the ogive in statistics. The chance variable x itself is merely 
the function x, taken in conjunction with the monotone function Fix), 

If j xdFix) exists as an absolutely convergent Stieltjes integral, the value 

of the integral is called the expectation of x, and will be denoted by Fix). 

II 

Let Fixi, • • . , Sn) be a function defined over n-dimensional space, which is 
monotone, non-decreasing, continuous on the left in each coordinate if the others 
are held fast, and which satisfies the conditions 

(2) lim Fixii . . . , icj = 0, ; = 1, . . . , n, lim Fixi, • ■ ■ , a:„) = 1 

1 C ,-+-00 

where in the last limit, x^, ■ ■ ■ ,Xn become infinite together. Then F(si, - ,Xn) 
is said to be the distribution function of a set of chance variables X], • • ■ , x„, 
and Fixi, • • • , a:„) is said to be the probability that all the inequalities x, < x,, 
(i = 1, , n), hold simultaneously. It can be shown that the function 

F,ix) = lim (ii, • ' • ^/_i, X, is of the type discussed in §1. The 

{it "tln-i— 


* Research under a grant-in-aid from the Carnegie Corporation. 
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function Fjix) is called the distribution function of x,. The chance variables 

n 

Xj, . • • , x„ are called independent if F(xi, -••,«„) = H The chance 

,=i 

variables xi, • • • , x„ are merely the functions Xi, . ■ • , *„ defined over n- 
diraensional space, taken in conjunction with the function F(xi, ■ • •,*„). 

If oi, • • ■ , fln are any real numbers, the number F{ai, . . •,«„), the prob- 
ability that Xi < a,, j = 1, • ■ • ,n,i<s also called the probability that a sample 
( 2 : 1 , • ■ ■ )^n) shall be in the region of n-dimensional space determined by 
Xf < a-ti j — Thus regions of this special type have probabilities 

attached to them. Using the usual additivity rules, probabilities can be at- 
tached to more general regions, and in fact probability can be defined on a col- 
lection C of regions including all open sets, closed sets and all sets which can 
be obtained from them by repeatedly taking sums, products, and complements. 
(Such point sets are called Borel measurable). The resulting function of point 
sets is non-negative and completely additive.* 

If f(xi, • • • , Xn) is any function of Xi, , Xn let Ex be the set of points 
(®i, ,Xn) where f < x. Suppose that Ex is in the collection C for all values 
of X, and let F(x) be the probability attached to the set Ex. Then it is readily 
seen that F{x) has the properties discussed in §I and is therefore the distribution 
function of a new chance variable x, which will be denoted by /(xi, • • • , x„). 
The chance variable /(xi, ■ > • ,x„) is merely the function /(xi, ■ • • ,x„) taken 
in conjunction with the distribution function F(xi, • • ■ , x„). (An example is 
f(xi, ■■■, Xn) = Xi -j- Xn, determining the chance variable Xi -f- x„.) 
Suppose that E(x) exists, 

(3) E(x) = J" xdF(x). 

Then it can be shown that the n-dimensional (Lebesgue)-Stieltjes integral 

/ OO Too • 

• • • / /(xi, • • • , x„)dF(xi, ,Xn) 

00 J—<J0 

exists and has the value E{x). Conversely the existence of the integral (4) im- 
plies that of (3). 

If there is a Lebesgue-integrablc function <p{xi, • • • ,x„) such that 



* That is, if p{E) is the value of the set function on the set E, and if E,, Ei, > • • are 
point seta with no common points, and which are in C, P\ ^ Fm \ ^ 

\m - 1 / m- 1 
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the functiou ip is said to be the density function of the distribution. In this 
case (4) becomes 

/ OO Too 

• ■ • j I ‘ I ) ■ ■ ■ » ^'i) • • • dxti . 

■M J—OC 

The probability attached to a point set E in the collection C is the integral 
(4) (or (4') if there is a density function), where f = 1 over E and / = 0 else- 
where. 


Ill 

Let X, Xi, Xs, • ■ • be a sequence of chance variables. We suppose that for 
every integer n, x, Xn determine a bivariate distribution. Then it is readily 
seen from that there is a chance variable 1 x„ — x 1 and therefore that 
Pj| x„ — X 1 ^ X}Ms defined for every number X, If 

(6) lim P{| x« — X I ^ X} ==1 

n->o0 

for every positive number X, the sequence x„ is said to converge stochastically, 
or to converge m probability, to x. If a is a constant, P { | Xn — a | S X} is also 
defined for every number X, and there is a corresponding definition of stochastic 
convergence to a. The usual theorems about limits hold: if x„, y„ converge 
stochastically to x, y, x„ y„ converges stochastically to x y, etc. 

An example of stochastic convergence is given by the law of large numbers. 
Let X be a chance variable with disWibution function F(x) and suppose that 
E(x), E(x‘) exist, i.e. that 

I xdF(x ) , I xHF{x) 

J -tc J ^tc 

are absolutely convergent integrals. Let Xi, • • • , x„ be chance variables whose 

n 

n-variate distribution function is JJ F{xj ) : we are thus supposing that the vari- 
ables all have the same distribution and form an independent set. Then 

- y, X, is a now chance variable, and Tchebycheff’s inequality furnishes an 

71 

immediate proof that - ^ x, converges stochastically to Eix).* 

71 


’Throughout this paper, if 7 represents a set of conditions on chance variables, PIt) 
will denote the probability that those conditions are satisfied, 

1 ” 1 

’ If X„ = - ^ x„ jB(x„) = E(x), E(i\) = Then if X is any positive num- 

n fz'i n 

her PI I - E{x) I > X| £ — — ^ — which implies (6), 
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There is also another kind of convergence, called convergence with prob- 
ability 1. The sequence {x„} converges with probability 1 to x if 

(7) lim P{\ x„ - X I ^ X, 1 x„+, - X [ ^ X, • • . , 1 Xn+„ - x 1 S Xj = 1 

7l-*00 

for every value of p ^ 0, uniformly in p S 0 for every positive number X. If 
p = 0 in (7), (7) becomes (6), so that convqTgenee with probability 1 implies 
stochastic convergence. Although the converse is not true, if {x„j is a sequence 
of chance variables converging stochastically to x, then? is a subsequence of 
{x„l which converges with probability 1 to x.^ The usual limit theorems hold 
here also : if x„, y„ converge with probability 1 to x, y, x„ -f y„ converges with 
probability 1 to x -f y, etc. 

An example of convergence with probability 1 is the following. If in the 
previous example the hypothesis that J^(x^) exists is removed, so that only the 
weaker hypothesis of the existence of E{x) is supposed, the Tchebycheff in- 

equality can no longer be applied, but a different method shows that Xj 

n 

; “ I 

converges with probability 1 (and therefore stochastically) to This result 

is know’n as the strong law of large numbers. 

. IV 

Let X, xi, Xj, • • • be a sequence of chance variables with distribution functions 
F{x), Fiix), F 2 {x), • • • respectively. Then if lim F„{x) = f (a:) for every value 

n— 

of X, the distribution of x„ is said to converge to a limiting distribution with 
distribution function F(x). 

As an example, consider the Laplace-Liapounoff theorem. Let Xi, X 2 , • • • 
be a sequence of independent chance variables (i.e. any finite number of them 
form an independent set) with the same distribution functions, and let E(xn), 
B{xl) exist. We suppose that = E{[x„ — £;(x„)]^} > 0 so that the dis- 
tribution of x„ is not merely confined to one point. Then the distribution of 

(8) n~'^ [*7 - Flix,)] 

7 = 1 


‘The theories of probability and of measure are' fundamentally identical. Chance 
variables correspond to measurable functions Stochastic convergence corresponds to 
convergence in measure, and convergence with probability 1 corresponds to convergence 
almost everywhere. The relation between these two types of convergence is discussed 
(in the terminology of the measure theory) in B. W Hobson, The Theory of Functions of 
a Real Variable, second edition Vol. 2, pp 239-244 

‘ Cf. for instance J. L. Doob, Transactions of the American Mathematical Society, 
Vol. 36 (1934), pp 764-765. 
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converges to a limiting distribution with distribution function^ 

1 

(9) — 7 = / e ^“‘‘dx. 

The convergence of a sequence of n-variate distributions is defined as the 
convergence of the distribution functions just as above for n = 1. Suppose 
that (xii, • • • , x„i), {xii, ■ • ■ , Xni), • • • are independent sets of chance vari- 
ables (i.e. the distribution function of any finite number of sets is the product 
of the distribution functions of the sets) with the same distribution functions 
We suppose that E{Xji), E{x)i) exist, j - I, ,n and that = jS{[x,i _ 

-L ^ 

©(Xji)]^) > 0. Then if x,>« = m ® — E(xjt)], the n-variate distribution 

• -1 

of Xim, • • • , Xnm convcrgcs to the normal distribution® about zeto means with 
variances (Tj, • • ■ , <r^ and correlation coefficients {/>;,} where o-,(T,p,, = ^{[x,i - 
£(x.i)][x,i - i?(x,i)]}. 

Three lemmas will be needed below in applying these concepts. 

Lemma 1. If [xn] is a sequence of chance variables tvhose distributions approach 
a limiting distribution and if jy„} is a sequence of chance variables converging 
stochastically to 0, the sequence {x„y«) converges stochastically to 0. 

For if F{x) is the distribution function of the limiting distribution, and if X, 
p are any positive numbers, 

P(lx„y„l <X} g P{lx„y„l <X, [ Yn 1 g nl P{lXn 1 < X'/lX, lynl^M} 

(10) SPilYnl g/i} - P{|x„1SX/m) =-P!|y„l>M} -f PilXnl <X/m} 
& -PlUnl >m} +P{Xn <X//l} - P{x„ < -X/2 m}. 

Then, letting n become infinite, 

(11) lim inf P{\ x„y„ 1 < X} ^ P(X/a) - F(-\/2p) 

Letting p approach 0, P(X/n) approaches 1 , F(— X/2p) approaches 0, and the right 
hand side becomes 1, as was to be proved. 

Lemma 2. Let {xn}, (yr.), {z„| be sequences of chance variables such that the 
distribution of x„ approaches a limiting distribution with continuous distribution 
function F{x) and such that the sequences {yn}, {Znj converge stochastically to 0, 
1 respectively. Then the distributions of |xn/z„}i“ and of Xn 4- yn approach limit- 
ing distributions with the same distribution function F{x). 

’ A. Khintchine, Ergebnisse der Mathematik, Vol 2, No 4: Asymptotische Gesetae der 
Wahraoheinlichkeitsrechnung, pp. 1-8. 

* Ibid, pp 11-16. 

• If (o„) is a sequence of real numbers lim sup On is defined as lim {least upper bound 

n— 

On, Un+i, • • • 1, and lim mf a„ is defined as — hm sup (— a„). A necessary and sufficient 

rt — *00 n -+« 

condition that the sequence {un} converge to a limit a is that lim inf a„ = lim sup « a. 

, 71— +00 n~+“8 ' 

*“ Since z„ converges stochastically to 1, the probability that Zn = 0 approaches 0. The 
theorem is independent of the way x„/z„ is defined when z„ = t). 
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Since — = x„ + x„ (neglecting the possibility that z„ may vanish), 

Zn Zn 

where the last term converges stochastically to 0 by Lemma 1, it is sufficient 
to prove the second part of the theorem. If « > 0, and if x is an arbitrary 
number, 

(12) Plx„ + y„ < a:} = + y„ < a:, 1 y„ j g e) 4 - P{x„ + y„ < *, 1 y„ i > «} . 

Since the sequence (y„i converges stochastically to 0, 

(13) Urn P{x„ 4 - y„ < a:, 1 1 > e} g lim P{| y„ | > e} =0 


so that in the limit the second term in (12) can be neglected. Moreover 

(14) P{x„ 4- yn < x, 1 y„ 1 g e) S P{xn<x + e} . 

If we let n become infinite and then let e approach 0, (14) becomes 

(15) lim sup P(x„ 4 - y„ < a:) S F(x) . 

A similar argument shows that 

(16) lim inf P{x„ 4 - y„ < »} S Fix) , 

•n->oo 

and (15), (16) taken together imply that 

(17) lim P{x„ -f y„ < a:} = Fix) , 

n*-n» 

as was to be proved. 

Lemma 3. If Xi, X 2 , Xa, X 4 are chance variables whose distribution has density 
function 

1 — L(i}+i5+*1+*’) 

(^^® ■ 


the distribution of z — xiXs — X 8 X 4 has density function le""' * 

The distribution of u = XiXj and that of v = — X 3 X 4 have the same density 
function: 


(18) 



Hence the distribution of z has density function 
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If we change to polar coordinates: t = r cos 9, t = r sin 9, and integrate out \ 
we obtain 


1 

TT 


t/2 


drd9 


V 

Theorem 1. Let Xi, xa, Xa, Xi determine a ^-variate distribution with distri- 
bution function F(xu X 2 , xs, Xi). Suppose that E(xi), E(x\), E(x\x]) exist, 
j = 1, • ■ ■ , 4, and suppose that E(xi) == 0, E(x]) = 1,“ i, j = 1, 2, 3, 4. 
Let Xi,, Xa,, X 3 „ Xi, have the same 4c-variate distribution as Xi, Xa, Xs, Xi,j = n, 

n 

and let the in-variate distribution function of |Xy j be E{xi;, Xij, x^j, xif). We 

j-i 

shall use the following notation {which suppresses the dependence on n)‘. 

71. 1 ” 

(20) 2 Xu , = ~ 2/ • 

^ k-=L ” 

Lei <p be a function of s,„ defined in a neighborhood N of P: — 0, s^' = Pm 
which, together with its second partial derivatives is continuous in N. Define 
cr S 0 



where the partial derivatives are evaluated at P. Then if a > 0, the distribution of 
Vn {‘p — ^{P)] {where <p has the arguments s,y) converges to a limiting distribu- 
tion which is normal with mean 0 and variance 

To prove this theorem we expand <p in the neighborhood of P, obtainmg 

4 4 

(22) -\/n[<p - <p{P)] = ^ ipii — Stj) + Rn 

where the partial derivatives are evaluated at P, and where Rn consists of a 
linear combination of VnhiPik ~ %*), \/n{pii — s,,) {pu - SH),with 

coeifecients which are uniformly bounded as long as Siy are in the neighbor- 
hood N. Now 

(23) lim = 0 lim s„ = p,j 

3 71— *00 71-^00 

with probability 1, by the law of large numbers, and as n becomes infinite the 
distributions of \/n\„ '\/n{pii — s,,) converge to limiting distributions,. by the 

“ The hypothesis that ®(x,) = 0 involves no real restriction, since the general case can 
be reduced to this one liy substituting x, — E(x,) for x, The hypothesis that = 1 

can be met by substituting x.lBlxi)]"! whenever J®(xJ) > 0, which will always be true un- 
less X, = 0 with probability 1. 
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Laplace-Liapounoff theorem. Then by Lemma 1, the terms of R„ converge 
stochastically to 0. The other terms of -s/nlv! — <p(L‘)] are sums to which the 
Laplace-Liapounoff theorem can be applied, giving the desired conclusion. 

As an example of the application of this theorem, we suppose that p is a 
correlation coefficient; 


(24) 


<P = 


Si 2 

(su 822 ) ^ 


v’(-F’) = Pi 2 . 


Here O'® is£l{[xiX 2 — ipwCx? -t- x*)?}, (which reduces to the familiar result 1 “P 12 
when the bivariate distribution of xi, X 2 is normal) and ir = 0 only when, with 
probability 1, 

(25) 2 XiX2 = pi2(x? + X 2 ). 


As a second example we suppose that ip is a tetrad difference : 


(26) 


Sl3 S 24 — Sl4 S 23 
(811822833844)^ ’ 


ip{P) — P13P24 — Pl4P23‘ 


Here becomes 


(27) 


= E 


P24XlX8 P13X2X4 


P14X2X3 — P23X1X4 — 



and 0 - = 0 only when the quantity in the brackets vanishes with probability 1 . 

If in either of the two above cases is substituted for (i.e. if the 

deviations from the sample mean, not those from the true mean, are used), the 

result is unaltered. This is true in general, smee ~ are unaltered at P by 

this substitution 

There is a well-known 5-method used in statistics to find limiting variances 
of statistics of the type covered by Theorem 1,*^ and Theorem 1 shows an 
interpretation which can be given to the results obtained by this method. 

We now investigate the necessary modification of Theorem 1 if o- = 0, i e. if 


(28) 


y 






With probability 1. If we assume that <p has continuous third partial deriva- 
tives m the neighborhood W, we find that 


Examples of the use of this method can be found in T. L Kelley, Crossroads in The 
Mind of Man, Stanford University (1928), pj) 49-60, and in an article by S. Wright, Annals 
of Mathematical Statistics, Voi 5 (1934), p. 211. 
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nlv - <fi{P)] = 5 S Iff ? S 


(29) 


2 ^ dU, 






9'-® Ip 




(Si; Ptj)(.^hl Pfci) -]- Rj^ 


where R( converges stochastically to 0. The second degree terms constitute 
a quadratic form in {?„ S;i - p,*,}. Now the multivariate distribution of 
{VwCt, VniSjh — Pji)}, by the Laplace-Liapounoff theorem, converges to a 
normal distribution whose variances and correlation coefficients are those of 
x„ x,Xfc. The distribution of n[(p — (p{P)] thus converges to the distribution 
of the quadratic form 


(30) 


n 

2 




t,] , k 


9|.' 9s, A 


9*cp 


5<, , 


where (ai, A^} have the multivariate distribution just described, unless the 
quadratic form vanishes identically. This reasoning can be continued, the 
general result being that there is some power v of w, if p is sufficiently regular, 
such that the distribution of ^[ip — ^(P)] converges to a limiting distribution. 

When (7 = 0 in the second example, unless the distribution of x,, Xa, Xa, X4 
is confined with probability 1 to a 4-dimensional quadric, pis = pu = Pn = 
p2t = 0. Equation (29) becomes 

(29') nl<p - p(P)] = Si3Ss4 - S14S23 + R^. 


Now if Xi, X2 are transformed by a linear homogeneous transformation with 
determinant A, it is readily seen that S13S24 — S14S23 is multiplied by A. The 
same is true of Xs, X4. If Xi, X2 are transformed into x( , xj so that E{x[^) = 1, 
E(x[xi) = 0, the determinant of the transformation is ±(1 — Pa)”^- Then 
transforming each pair (xi, X2), (xs, X4) in this way into (x(, xj), (xi, x(), the 

variables x(, X2, Xj, x( are uncorrelated. If s(, = - ^ x(tx,%, 

^ k = l 


(31) 


I t 
S13S24 


S14S 


14“23 — 


813834 


8,48, 


14 033 


±(i - p?2)Hi - pL)^’ 


The limiting distribution of 5(3824 — 5(48(3 is the distribution of §(35(4 — 
where these four chance variables are normally distributed, ^(^(3) = ^(3(4) = 
^(3(4) = ^(3(3) = 0 , p( 5 :,) = p(x:x;),p( 3 :,- 3 (() = p(x:x(x(x(). now if 
X], X2, X8, X4 are normally distributed — ^the most important case for statistical 
purposes— x(, x(, x(, x( will also be distributed normally, and the vanishing of 
the correlation coefficients means that the chance variables are independent. 
If this is true 


(32) 


E(^[]) - 1, 


m.Ai) = 0. 


(3.,5«^ 3«)- 
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Evidently, however, x',, Xa, X3, X4 do not have to be independent to make these 
equations valid. It is more than sufficient if the pairs (xi, xa), (xj, X4) and there- 
fore the pairs (x'l, Xj), (xj, X4) are independent. If (32) is true, the ff’s are in- 
dependent, each one being normally distributed with mean 0 and variance 1. 
Summarissing these results, and using Lemma 3 : ^f <p -is the tetrad difference and 
if = pi4 = p23 = = 0, the distribution of n[(p — <p(P)] converges to a limiting 

distribution. If in addition the distribution of Xi, Xa, X3, X4 is normal, or if the 
■pairs (xi, xa) (xa, X4) are independent, this limiting distribution has density function 

^ g-ol *1 
2 

where c ^ (1 — (1 — pL)-*- 

Wilks has investigated the case where xi, Xa, X3, X4 are normally and inde- 
pendently distributed, and in this case found the exact variance of the tetrad 
difference as a function of n.'-^ 
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n Proceedings of the National Academy of Sciences, Vol 18, (1932), pp 562-565. 
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ON THE POSTtrLATE OF THE ARITHMETIC MEAN 
By Richmond T. Zoch 
Introduction 

Suppose n observations have been made of an unknown quantity. It is de- 
sired to know the most probable value of the unknown. When Gauss gave his 
development of the so-called Normal Law of Error, he assumed that the Arithmetic 
Mean of the n observations is the most probable value. The question arises: Can 
this postulate be justified? 

In the excellent book, entitled “Calculus of Observations,'’ by Whittaker and 
Robinson^ there is given a proof which purports to deduce the postulate of the 
Arithmetic Mean from assumptions of a more elementary nature. This proof 
is not correct. 

Since this book has had wide circulation, it is believed that the errors in this 
proof should be called to the attention of the users of the book. The present 
paper has been prepared for this purpose. The first part of this paper points 
out the questionable features of the proof given in Whittaker and Robinson's 
book. The second part gives some critical comments on the original sources 
from which Whittaker and Robinson obtained their proof. 

Part 1 

The assumptions on which Whittaker and Robinson based their proof of the 
postulate of the Arithmetic Mean are: 

Axiom I. The differences between the most probable value and the indi- 
vidual measures do not depend on the position of the null-point from which 
they are reckoned. 

Axiom II. The ratio of the most probable value to any individual measure 
does not depend on the unit in terras of which the measures are reckoned. 

Axiom III . The most probable value is independent of the order in which the 
measurements are made, and so is a symmetric function of the measures. 

Axiom IV. The most probable value, regarded as a function of the individual 
measures, has one-valued and continuous first derivatives with respect to them. 

It is fairly easy to show that if the Arithmetic Mean is the most probable 
value, then the above four axioms follow as conclusions. The converse, viz. if 
the above four axioms be assumed then the Arithmetic Mean is the most prob- 
able value, however, is not true. That is to say the above assumptions are 

‘ The Calculus of Observations by E. T . Whittaker and G. Bobineon, Blackie & Son, Ltd,, 
London (1920), pp. 215-217. 
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necessary conditions, but not sufficient conditions. For, consider the following 
function of the measures : 


IH 

tj.% 


1 

- 2 ~ »)* 
n 1-1 

-S (*.-«)“ 

n 


where x is the Arithmetic Mean of the x,. 

Clearly this function is a symmetric function of the measures (xi) and there- 
fore satisfies Axiom III. If the Xi are each multipiied by k then the Arithmetic 
Mean (x) is also multiplied by k and we have 


- X) {kxi — kxy 

n »—i 

i ~ 



that is to say, if we multiply the individual measures by k it is the same as multi- 
plying the function — by and therefore the ratio of any individual measure 

fii 

to the most probable value (function) does not depend on the unit used. Hence 

the function — satisfies Axiom II. 
y-i 

The partial derivative of — with respect to xi is 


2 (®.- - (xi - ^)*||- + 3(xi - «)* 

^ ir 

^ (xi - 2 ! ^ (x, ~ x) 


dxi 


[i-i 


L U-i 


_ ^ 

dxij 


-j- 2 (aJi — 5) 


dxi 

dXi 


:]) 


'*■ IS ~ ^)4 = 


3^2 [(a:i — — M 2 ] — 2n,(^i ~ *) 




n/jil 


since ~ ss i, and 2 — ^) = 0. The partial derivatives of — with respect 

OiTl n (ail M2 

to each of the x, are of the same literal form and clearly these partial derivatives 

are single valued and continuous. Therefore the function — satisfies Axiom IV . 

M2 


Ms • 

Now it can be shown that if h be added to each x,, then the function — is 

unchanged and hence this function does not satisfy Axiom I. (It should be 

noted that the function — is invariant under the transformation specified by 
M2 
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Axiom I.) However, consider the function x + a^=f, where a is a constant 

Hi 

independent of the Clearly, / satisfies all of the four axioms. 

Thus a function, distinct from the Arithmetic Mean, has here been exhibited 
which satisfies the four axioms given in Whittaker and Robinson’s book. Hence, 
these four axioms are not sufldcient to establish the postulate of the Arithmetic 
Mean. The question arises : Where is the proof given by Whittaker and Robin- 
son lacking in rigor? The proof given is essentially, as follows. (No part of the 
proof given by Whittaker and Robinson is here omitted; in fact, for the sake of 
rigor and careful reasoning, further explanations are given and the various steps 
are numbered.) 

(1) Suppose the most probable value is expressed in terms of the n measures 
xi, Xi, ■ ■ • ,Xn by the function ^{x\, Xi, ■ ■ ■ , Xn) } that is to say the most probable 
value is some function, <t>, of the observations, or: the most probable value 

S (j>(xxj Xij * * * ) ^Tn). 

(2) By the theorem of the mean value in the differential calculus, which by 
Axiom IV is applicable, we have <t>ikxi, kxt, • • • , kx^ = 

,0) +fc^i[~]+ ••• 


where the square brackets denote that every a;, is to be replaced by where & 
lies between 0 and 1 . 

(3) By Axiom II, the left hand side = k^{xi, a;*, • • • , a:„). 

(4) By the continuity of <#>, postulated in Axiom IV the equation 
kxi, • • • , kx,) — k^{xi, Xi, • •• , Xn) must hold in the limit when A; is 0, 

that is if>{0, 0, • ■ • , 0) = 0. 

(5) We now have 


kft>{Xi, Xi, ■■ ■ , Xn) 



or on dividing by k. 


^(Xl, X 2 , ' * ' , Xn) 



(6) In this last equation let A: — » 0 : then each of the quantities 


tends 

Jxij 


to a value which is independent of the x’s and we can write <^(xi, xj, ■ • ■ , Xn) = 
CiXi -j- ■ ■ • c„Xn where the c’s are independent of the x’s. 

(7) By Axiom III the c’s must all be equal, so 


<t>{xi, Xi, • ' • , Xn) = c(xi -h a:2 d" • • > -b Xn). 


(8) From Axiom I we have 

*^(xi “b h, Xi "b k, • • • , Xn "b ’ * * I d" 
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(9) If in this last equation we let the Xi all approach zero then we have cnh = h 

and therefore c = i and finally 
n 

Xi, ■■■ ,Xn) - - (iCi + + • • • + x„) 

n 

which states that if} = the most probable value = the Arithmetic Mean. 

It should be noted that the first six steps involve only Axioms II and IV. Of 
these first six steps the second and sixth are questionable. 

The sixth step involves the tacit assumption that the partial derivatives are 
functions of lb. These partial derivatives are not necessarily functions of k and 

the example given above, viz, / = ^ + o — isa function whose partial deriva- 

tives are independent of k; in fact no fimction of the form 

1 «n 

F s X + 2 

2 (x, - x)»-‘ 

.”1 

will satisfy the tacit assumption involved in the sixth step ; nor is F the most gen- 
eral function which will not satisfy the tacit assumption, thus take for example 


+ . 

Consider now the second step. Take the function <l>{yi, Vi, ••• , Vv) = 
k<l>(,xi, Xi, ■ • • , x„). Then, by Axiom II, we have y, = kx,. Apply the Theorem 
of the Mean Value to instead of <l)(x{). Then (i>(yi, Vi, • • • , Vn) — 


(0, 0, - • • ,0) -f 2/1 1—1 + • ■ • 4- 2/« 1— I- Now if we replace y, by kx, 
L92/iJ L9ynJ 

we obtain the equation given m the second step except that the square brackets 

are now of the form F and not f —1 as given by 
L 9(te<) J L9*.J 

r- ^ . “I 


Whittaker and Robinson. It is difficult to decide whether by J Whittaker 
and Robinson mean 

r 9</>(A;xi, kxi, - • • , fcx„) 


L' 


dXi 


]»[ 


d<f>{ x\, Xj, • • ■ ,X 
dXi 


:»)■ _ 


These last two expressions are not equal. To make the second step more clear 
it is necessary to demonstrate that 


f 9^)(fcxi, fexs, ■ • • , kxn)'] _ f d4>ixi, Xa, ■ , Xn)^ 
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and this has not been done. In order to demonstrate this equality further use 
EQUst be made of Axiom II. It appears that the questionable features of the 
second step may be overcome by starting with the equation implied by Axiom 
II, thus 

<t>(kxi, kxi, • • . , kx„) = k(l)(xi, Xi, , x„); 

in other words is a homogeneous function of degree 1. Therefore use can be 
made of Euler’s Theorem on homogeneous forms. In this way we obtain : 




which is an abbreviation of the last equation given in the fifth step. 
Now, making further use of Axiom II we have : 


d<t>{kxi, kxj, • ■ • , kxn) 
d(kx,) 


d(kxi) 


k<t>(xi,X2, 


1 0 

, X^) = Xi, ... , Xn) 


It follows that 


d<i>(xi, X2, •• • , Xn) _ d<l>(]CXi, kxt, . ■ • , kXn) 
dx, d{kxt) 

From this development we conclude that for any function whatever which satis- 
fies Axiom II the last equation of the fifth step cannot possibly involve k. 

In order to overcome the defect in the sixth step it is necessary to make a more 
restrictive assumption. If in place of Axiom IV, we assume that "The most 
'prohable value, regarded as a function of the individual measures, has first partial 
derivatives mth respect to them which are constant," then the equation given in the 
sixth step can be rigorously established. 

After the equation of the sixth step is rigorously established there remains an 
objection in the seventh step. The axioms do not explicitly state that the n 
observations must be functionally independent. Therefore suppose the x, are 
functionally dependent according to the relation x, = y,z where the yi are all 

constant. Then the function f = x + ~ will have partial derivatives with 

M2 

respect to the x, which are unequal and constant; yet at the same time the 
function / is a symmetrical expression of the n variables. 

Hence in order to establish the postulate of the Arithmetic Mean along the 
lines followed by Whittaker and Robinson it is necessary to make another restric- 
tive assumption slightly different from that proposed in the last paragraph but 
one, and assume (in addition to Axioms I and II) that the function has partial 
derivatives with respect to the Xi which are equal. 
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Part 2 

The first original paper consulted was one by Schiaparelli.® In this paper nine 
propositions are presented four of which are also called lemmas. From a strict 
mathematical point of view the four propositions which Schiaparelli calls lemmas 
are really postulates. Schiaparelli discusses these four lemmas at length; three 
of these lemmas are the first three axioms given in Whittaker and Robinson’s 
book. The fourth one is: “When, in the function <#>, all the variables (xi) take 
the same value a, the function itself becomes equal to a,” (This, as a matter of 
fact, is the definition of an average). 

In his discussion of these lemmas, which are based partly on practical and 
partly on philosophical grounds, Schiaparelli points out that they are justified 
from the practical or statistical nature of the problem involved in arriving at the 
most probable value (Schiaparelli uses the term “true value”) of a set of obser- 
vations. In the present writer’s opinion, these discussions are the most excel- 
lent part of Schiaparelli’s paper. These discussions are even more significant in 
view of the fact that the later writers on this subject make no attempt whatso- 
ever to justify the use of their postulates. 

Schiaparelli remarks that we should have no reason for not expecting that a 
small change in a single observation should produce a small change in the func- 
tion 4>; but he does not make this remark in the form of an explicit postulate. 
This could have been done and, moreover, such a postulate of continuity could 
be justified from the practical nature of the problem. It seems that a more 
elegant procedure would have been to deduce the continuity of the function and 
its derivatives from Axioms I and II. It will be shown later that this is possible. 
From his remark on the continuity of the function, Schiaparelli concludes that 
the partial derivatives of 4> with respect to the Xi exist and are continuous. His 
method of arriving at this conclusion is not valid, for it is well known that an 
arbitrarily assumed function may be everywhere continuous and yet possess a 
derivative at no point. 

Schiaparelli’s Proposition HI states: “When in the function <l> all the Xi take 
the same value, then the — become equal to each other.” This Proposition is 

dXi 

false. To show this, consider the function 

where the 


^ _ 1 3aa[(a:. — x)® — Mai - — x) 

n nfil 


’ Giovanni Schiaparelli — Come si possa giustificare I’uso della media aritmetica nel cal- 
oolo diexisultatid’OBservazione, Rendiconti Reale Instituto Lombardo di Soienze e lettere, 
Vol. XL (1907), pp 752-764. 
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NoW) when the Xi all approach a then both / and become indeterminate 
forms. However, in this case / takes an indeterminate form which can be 
evaluated and it can be shown that ^ will always have the value zero, i.e., / 

wUl have the value a when all the X{ = o; while the can take any value 

whatever and in general the ^ will not be equal when the xt -> a. To. illus- 
trate: Consider the observations yi = 1, 2/2 = 3, 2/3 = 4 then y = 8/3 and 
juj = 14/9 and ixs — —20/27 whence / = 8/3-10/21. Now assume that these 
three observations all approach 2 in a certain way, i.e., let a:, = 2 -|- (y, — 2)z. 
Then x — 2 + {y — 2)z — 2 (2/3)z. 

^2(0:.) = - S (y. - yY = (14/9)z* 

and 

= ^^~■2{y^- yY = (-20/27)z‘ 

Tl 


whence / = 2 -f (2/3)z — (10/21)z. Clearly as z — > 0 the x,—*2 and / — > 2. 
However, 


== i 4. 

dXi_ X|». 24 .(vi-. 2 )£ 3 294 


M.! 


J 2+ <y I— 2>* 


dXs 


^ 1 

iJi,-2+(v,— 2)t 3 


_ ^ 

294’ 

m 

3 294 ‘ 


Thus the — are not functions of z and as the a:, — > 2 the — remain cdnstant 

dX{ dXi 

and unequal. 

From his conclusion that the derivatives of <t> exist and from Axiom I, Schia- 

i “n nr 

parelli obtains the equation, = Ij (this equation being his Proposition 

» “ 1. oXi 

V) in the following way: Since the derivatives of tj> exist, then, by the Theorem 
of the mean value. 


^(^1 “ 1 “ A, 3:2 -j- :K 3 “k * ' * I ^ 7 t “k 

= c/,(x„ 0:2, , 3:„) + ^ (^^ + ^^ + ••• + ^) • 

By Axiom I: 

<#>(3:1 + h, xa -i- h, ■ • ■ , x„ + h) = (f>(xi, Xi, ■ ■ ■ , x„) + h . 


(A) 
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t «n 

Whence 2^ ~ == 1. Now this equation is correct but the above proof of it 
is not convincing. Clearly, according to the Theorena of the mean value, in 
equation (A) it is necessary to replace each Xi in the ~ by dxt where fl is 

dXx 

between 0 and 1. 


Schiaparelli’s Proposition VII states in effect that the 


dXt 


are invariant under 


the transformation x[ ~ Xt h where h is constant, and his Proposition IX 


states that the — are invariant under the transformation x'. = kx^ where k is 

dXi 

a constant. These two propositions are correct and are correctly established. 
Making use of his Propositions III (which is false), V, VII and IX, Schiaparelli 
proceeds to the establishment of the postulate of the Arithmetic Mean, as 
follows: 

Let 0 = 4>{xi). As the Xi vary, then a varies but for a particular set of x, 
then a is a constant. Now by Axiom I we have 


o -t- (m — 1) a = <t>(xi (m — l)a, (m — l)a, ... , Xn + (m~ l)a) — ma 

for all values of m > 1. Then by Axiom II: 

^Xi - h (ffl — 1) a ga 4- (m — 1) g Xn+ (m — 1) a \ 

’ m > '” > ^ J 

, fxi — a , X 2 — a , ^ Xn — 


(x 

a = I - 


m 


m 


+ o. 


m 


+ 


a). 


And by Propositions VII and IX, the — are unchanged during the above trans- 

dXi 

formations. Hence the last equation is true when tw — » and by Proposition 

1 . 

Ill (false) the — = - as when = a. In this final proof Schia- 

aXi n 

parelli gives a geometric illustration of each step. 

It is both interesting and strange to know that in closing his paper Schia- 
parelli does not claim that the Arithmetic Mean is the only function which 
will satisfy all of his postulates. In fact he himself points out that the func- 

i —n 

tion tf>, implicitly defined by the equation (4> — Xi)”* = 0 where m is an 

odd integer > 1 will satisfy all of his postulates. Furthermore he points out 
that this function will not satisfy his Proposition III. Schiaparelli’s object 
was to establish the postulate of the Arithmetic Mean without any appeal to 
the concept of probability. To accomplish this he made four assumptions each 
of which he justified by o priori reasoning. Then he proceeded with the above 
proof. Why he should have been satisfied with his OAvn proof after perceiving 

the function defined by 2!^ (^ — Xi)'” = 0 is hard to understand. 
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The second paper® consulted was also by Schiaparelli. It is merely an 
abridged form of the one just discussed. Schiaparelli wrote two earlier papers on 
this same subject (altogether Schiaparelli wrote four papers on it) but it was 
inferred from the footnotes in his paper, which has just been discussed at length, 
that it contained all of the material of the two earlier papers with which he him- 
self was satisfied. Therefore Schiaparelli’s two earlier papers were not con- 
sulted. 

The third paper consulted was that by Broggi.* Broggi states that the pur- 
pose of his paper is to establish the postulate of the Arithmetic Mean by purely 
analytic methods which are more brief than Schiaparelli’s method. Broggi 
words the assumptions upon which he bases his proof as follows : 

1 . is a symmetric function of its n variables ; 

2. The partial derivatives are single-valued and finite; 

3. We have kxi, • • • , kxn) — k<t>{xi, x^, ■ ■ ■ , Xn); 

4. We have <p(xi + h, Xi + h, ■■■ ,x„ -I- h) = </)(xi, xt, ■ • ■ , x„) + h, that is 
to say for 2: 


dXi ' dXi 


uXn 


(a) 


Broggi does not explain why he used the postulate 2 but presumably it was in 


order to exclude the function defined by 2 = 0. Consider the 

»=>! 

special case where m = 3. Then — Sc/)® Sa:, + 3i^ Si® — Si? = 0. Let 

p = 3 (- Si? — and g = — Si? — 2i® — i Si?. Also put R = 
\n / n ^ _ 

(p/3)® -t- (g/2)® and let A be the real cube root of — q/2 + aZ-B B be the 

real cube root of — q/2 — Then the three branches of <l> can be explicitly 

written 


<^i = A -j- B "f* 

<l>i == coA -|“ B “h X 
<l>z — w^A o)B “h ^ 

where w and to® &re the two complex cube roots of unity. Now while <t> does not 
satisfy the postulate that the function be single valued, <fti satisfies this postulate 
as well as all the others and so does 4>t and also <l>s. Hence, Broggi’s failure to 

comment at length on the function ^ {<i> — li)*" = 0 is unsatisfying. As a 

t = i 

matter of fact Broggi fails to point out any of the defects of Schiaparelli’s 

• Giovanni Schiaparelli — Come si possa giustifioare I’nso dellai media aritmetioa nel 
calcolo delle misure, senza fare alcuna ipotesi sulla legge di probabilitii. degli errori aooi- 
dentali, AstronomisoheNachrichten, Band 176 (1907) pp. 206-212. 

* Ugo Broggi — Sur Le Principe De La Moyenne Arithmetique, L’Enseignement Mathe- 
matique, XI (1909) pp. 14r-17. 
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paper, with the possible exception that he shows Schiaparelli's postulate which 
states 4> = a when each of the a:, = o to he a consequence of Axioms I and II. 
This is done so casually that it makes one wonder whether Broggi really was 
aware of the fact that Schiaparelli’s postulates are not independent, 

Broggi proves the Lemma: “A homogeneous function of the first degree which 
is a solution of the equation of partial derivatives (a) is an integral function.” 
This Lemma is correct and is correctly proved but its wording is apt to be mis- 
leading; in fact it appears that its true meaning was not clear to Broggi himself. 

For, while the function <j) cannot be of the form ^ where ^ is a homogeneous 

function of the p*** degree which satisfies Axiom I and x a homogeneous func- 
tion of the (p — I)* **' degree which also satisfies Axiom I, the Lemma does not 

mean and Broggi has not proved that <t> cannot be of the form = 0 -I- - where 

X 

U is an integral function satisfying Axioms I and II and 4' and x are homogene- 
ous functions of the p*** and (p — I)*** degrees respectively which are invariant 
under the transformation specified in Axiom I. By reason of this oversight, 
Broggi concludes that any function satisf 5 dng Axioms I and II must be linear 
in its n variables, a conclusion which is erroneous. 

The fourth paper consulted was that by Schimmack.® Schimmack's paper is 
in three sections. The first section contains the proof which is essentially that 
which Whittaker and Robinson give. In the second section Schimmack gives a 
different proof, from a set of new postulates. The new set of postulates is : 

Axiom I’ = Axiom 1. 

Axiom II’ — ^The most probable value is independent of the sense of direction 
of the scale upon which the observed values (and the most probable value) are 
reckoned, that is to say, 

-X 2 , • • • , -a;„) = Xi, ■ • ■ , a:„). 

Axiom III' = Axiom III. 

Axiom IV' — If from n observed values, the most probable value be computed 
and if one obtains an additional observed value then the most probable value of 
the n -p 1 observed values is the same as the most probable value of n -f 1 
quantities consisting of the initial most probable value counted n times and the 
(n -f- I)**" observed value, namely: 

4’n+l(Xi, ■ • • , • • • , 4‘ni 25 b + i ). 

In explaining the object of this second section, Schimmack says that postulat- 
ing the existence of the derivatives (Axiom IV) seems unjustified and ought to 
be avoided and only such axionos made which the intrinsic character of the prob- 
lem justifies. In connection with this statement of Schimmack’s it appears that 
the intrinsic character of the problem certainly does not justify Axiom IV'. In 

* Rudolf Schimmack — Der Satz vom arithmctischen Mittel in axiomatischer Begrttn- 

dung, Mathematisohe Annalen, Band 68 (1909) pp. 12.V132, 304. 
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fact, Axiom IV' appears to be quite artificial. Moreover, Schimmack does not 
attempt to justify Axiom IV' by a priori reasoning as Schiaparelli does for 
Axioms I, II, and III. While, if the Arithmetic Mean is the most probable 
value, Axiom IV' follows, since it is a property of the Arithmetic Mean, it does 
not seem to be in keeping with the intrinsic character of the problem to use this 
property as a starting point for later deductions. 

As regards Schimmack's objections to Axiom IV, all of the conditions specified 
by it can be deduced from the first two Axioms except that the derivatives must 
be single-valued. To show that this is true, consider an arbitrary function 
which satisfies Axioms I and II. Let this function be xi, ■ ■ ■ , x„). We do 
not know that cf» is continuous or that <f> has any derivatives. All we assume is 
that 4> satisfies the first three Axioms and it is here proven that (t> must be con- 
tinuous and have continuous partial derivatives. By Axiom I we can give 
increments to the a:,; hence we give each Xi the same increment, Ar, and then 
subtract ii> and we have: -f Ax, Xi Ax, ■ ■ ■ , Xn + Ax) — if>(xt, x^, ■ ■ ■ , x„) = 

A^ but by Axiom I, A4> = Ax. Therefore ^ =: 1 = ^. In other words, the 

dx 

total derivative of <t> exists and is constant. Therefore the total derivative of 
<j) is continuous. But since the total derivative exists, all of the partial derivar- 
tives exist. By Axiom II, <;& is a homogeneous function of the first degree. 

rifh fi/h 

Applying Euler’s Theorem for homogeneous forms, we have <i> = -j- Xi — 

dZi dx% 


-f • • ■ -t-Xn 


d<t> 


Since the total derivative of is everywhere continuous, 


is also everywhere continuous. Thus, the right hand side of the above equa- 
tion is everywhere continuous and each partial derivative is therefore everywhere 
continuous. 


As regards that part of Axiom IV which requires the ~ to be single valued, 

dxi 

it would seem more satisfactory to postulate that the function ^ is single-valued, 
for the single-valuedness of a derivative does not insure the single-valuedness 
of the integral while the single-valuedness of a function does insure the single- 
valuedness of the derivative where the derivative exists. 

In the third section of his paper, Schimmack shows Axioms I, II, III, and IV 
to be independent, and likewise Axioms I, II', III and IV'. 

Schimmack does not mention any of the questionable features of Schiaparelli's 
and Broggi's papers. 

The fifth paper consulted was that by Suto." Suto’s assumptions are: 

1®. <l>{x, X, ■ ■ • , x) = ® (This is Schiaparelli’s). 

2®. (l>(xi + yi, Xi yi, x„ + Vn) — <t> (xi, Xi, , x„) depends on the 
values of yi, pt, ■ ■ ■ , Pn only. 

3®. = Axiom III = Axiom III'. 


• O&osaburo Suto — ^Law of the Arithmetical Mean, Tohoku Mathematical Journal, Vol. 
6 (1914) pp. 79-81. 
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Suto says he believes these assumptions to be more simple and natural than 
Schimmack’s Axioms I -IV'. However, assumption 2® appears to be quite 
artificial and very restrictive. Suto does not even attempt to justify it by a 
priori reasoning. 

Suto shows his three Axioms to be independent. It is interesting to know that 
Suto has established the postulate of the Arithmetic Mean rigorously using only 
three postulates while Schiaparelli, Broggi and Schimmack failed using four 
postulates. In this connection it should be observed that when Axiom IV as 
given by Whittaker and Robinson is replaced by “The most probable value, 
regarded as a function of the individual measures, has first partial derivatives 
with respect to them which are equal” as suggested at the end of Part 1, then 
Axiom III can be deduced as a consequence of Axioms 1, 11 and the reworded 
Axiom IV, so that three Axioms only are sufficient to deduce the postulate of the 
Arithmetic Mean. However, it would be difficult to j ustify the reworded Axiom 
IV from the nature of this problem of the Arithmetic Mean. 

Suto does not point out any of the defects of the preceding papers. 

The last paper consulted was that by Beetle.’ It deals with the third section 
of Schimmack's paper. Beetle also fails to point out any of the defects of the 
preceding papers. 

Conclusion 

The postulate of the Arithmetic Mean can be rigorously established, without 
the use of the concept of probability, if sufficiently restrictive assumptions are 
made. The writers making sufficiently restrictive assumptions have failed to 
justify the use of them. Several proofs of the postulate of the Arithmetic 
Mean are clearly erroneous. The existing attempts to establish the postulate of 
the Arithmetic Mean without any appeal to the concept of probability are, 
therefore, unsatisfactory. 
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THE SHRINKAGE OF THE BROWN-SPEARMAN PROPHECY 

FORMULA 

By Robbet J. Wheeey 


At the recent meeting of the Conference on Individual Psychological Differ- 
ences held in Washington, Dr. Clark HuU of Yale University called attention to 
the fact that the much used Brown-Spearman formula involves, or leads to, if 
used without regard to certain limitations, a certain over optimism.^ In other 
words, if only this formula is taken into account, one would assume that the mere 
increasing in length of a test would automatically and, with continued increases 
in length, indefinitely continue to increase its reliability or validity. 

On the other hand, we know that the greater the number of test units the 
greater the shrinkage between the predicted and actually obtained value. At 
least we know this to be true when the value in question is a multiple correlation 
coefficient and the test units are independent variables. Hull raised the question 
as to whether or not the same fact might be true of the figures predicted by the 
Brown-Spearman formula. It is the purpose of this article to show that this 
shrinkage does occur, and that the Wherry-Smith shrinkage formula® satisfac- 
torily predicts this shrinkage. 

A quick review of the nature of the two formulae (the Brown-Spearman and 
the Wherry-Smith formulae) will at once show the importance of the discussion. 
The Brown-Spearman formula, as applied to the predicting of reliability, reads 
as follows. 


P M rii 

" - 1 + (M - 1) m ' 


( 1 ) 


where R = the predicted reliability, 
m = the discovered reliability, 

and M = the number of times the test is lengthened. Thus the test provides 
that the predicted reliability (R) will increase with each increase in M, but it is 
to be noted that the increase in R decreases with each increase in Af as the value 
of R approaches its limit of plus one. 

On the other hand the Wherry-Smith formula, which reads, 




(W - l)R^ -{M~l) 
N - M 


( 2 ) 


where R = the predicted value of the correlation, 
R = the discovered correlation, 

M = the number of independent variables 
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and N = tke statistical population (the number of cases), provides that, for 
each increase in M, the shrinkage in 5 as compared with R increases. Thus, if 

TABLE I 


Correlations Observed and Theoretical {Based upon Observed Means) 
{N = 37 throughout) 


M 

Observed 

Correlation predicted 

Error 

average 

Br.-Sp. 

Wherry 

Br.-Sp. 

Wherry 

(Trait 1) 

1 

.290 



|■|||■ 


5 

.728 

.671 

.618 


-.110 

10 

.717 

.803 

.726 


.009 

15 

.754 

.860 

.758 


.004 

20 

.806 

.891 

.825 


.020 

30 

.936 

.925 

.609 


-.427 

(Trait 5) 

1 

.419 



mM 

■■ 

5 

.736 

.783 

.761 



10 

.845 

.878 

.834 



16 

.887 

.915 

.856 



20 

.877 

.936 

.856 

.068 

B9 

30 

.876 

.966 

.746 

.080 

WBrnm 


(Trait 10) 


1 

.354 





6 

.479 

.733 

.692 

.254 

.213 

10 

.717 

.846 

.788 

.129 

.071 

16 

.862 

.892 

.816 

.040 

- 036 

20 

.636 

.915 

.822 

.279 

.186 

30 

.806 

.943 

.665 

.138 

-.160 

(All Traits) 

■■nil 

.320 



■■ 



.898 

.904 

822 


-.076 

■■ 

.872 

.933 

.576 

■■ 

-.296 


we assume that the Jlf 's in the two formulae are analogous, i.e., if we assume the 
Wherry-Smith formula to be applicable to the Brown-Spearman formula, we 
see that as M increases the Brown-Spearman formula adds a decreasing incre- 
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ment while the Wherry-Smith formula provides that an increasing decrement be 
subtracted, thus eventually we arrive at a point where by further increasing the 
length of the test we will decrease rather than increase the size of the reliability 
coefficient. 

If our hypothesis be true, we must, then, in order to predict the correct value 
of R, substitute the value of equation (1) in equation (2). Doing this we have 

_ {N - - {M_- l)*r^ _ 2(M - - (ilf _ 1) 

{N - M) [1 -(- 2(JJf - l)rji -b (M - l)“r?a] ’ 

which would then be the form in which the Brown-Spearman formula should be 
used in predicting reliability corrected for chance error by the Wherry-Smith 

TABLE II 


Error in Predicting Belidbility (Based upon Observed Means) 


Error 

Brown-Spearman 

Wherry 

over . 210 

2 

1 

.161- .210 


1 

.091- .150 

3 


.031- .090 

8 

1 

-.029- .030 

3 

6 

-.089- - .030 

1 

3 

-.149i- - .090 
-.209- - .160 


3 

below — . 209 


2 


TABLE III 

Rietz Criteria of Normality Applied to Results from Means 


Criterion 

Normal 

Brown-Spearman 

Wherry 

Hi 

0 

.074 

-.032 


0 

.561 

-.283 

/3» 

3 

2.008 

3.180 


formula. The same result can of course be secured by applying the formulae 
consecutively. 

In order to test the formula (3), the writer has applied it to some empirical 
data. A recent article by H. H. Remmers of Purdue University furnishes the 
needed data. Remmers study dealt with the increase in reliability due to in- 
crease in the number of judgments of certain traits of college professors.* His 
results, together with the results of applying formula (3) to the data are shown 
in Table I. 

An inspection of Table I shows at once that while the Brown-Spearman 
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formula gives results which are consistently too large (16 out of 17 times) the 
Wherry-Smith formula gives results which are more nearly equally distributed 

TABLE IV 


Correlations Observed and Theoretical (Based upon Observed Medians) 
(N = 37 throughout) 


M 

Observed 

Correlation predicted 

Error 

medians 

Br.-Sp. 

Wherry 

Br.-Sp. 

Wherry 

(Trait 1) 

1 

.344 



mm 


5 

.752 

.724 

.682 


1 

b 

o 

10 

.663 

.840 

.779 


.116 

15 

.702 

.887 

807 

.185 

.105 

20 

.805 

.913 

.805 

.108 

.000 

30 

.936 

.940 

.635 

.004 

-.301 

(Trait 5) 

1 

.460 





5 

.760 

.804 

.776 


.016 

10 

.856 

.891 

.852 


-.004 

16 

.931 

.925 

.873 

mssm 

-.058 

20 

.877 

.942 

.874 

.065 

-.003 

30 

.876 

.961 

.778 

.085 

-.098 

(Trait 10) 

1 

.363 





5 

.433 

.740 

.701 

.307 

.268 

10 

.764 

.851 

.795 

.097 

.041 

15 

.872 

.895 

.822 

.023 

-.060 

20 

.898 

.919 

.820 

.021 

-.078 

30 

.872 

.945 

.669 

.078 

- 203 

(All Traits) 

1 

.503 





20 

.898 

.953 

.879 

.055 


30 

872 

.968 

.829 

.986 

B9 


between positive and negative errors (7 to 10) , tending to slightly underestimate, 
The actual distribution of errors can be more easily seen by an inspection of 
Table 11. 
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Now, if our formula were perfectly correct, we should expect that the errors 
incurred by its use would be normally distributed about a mean error of zero. 
The Rietz criteria for normality of distribution were applied to these errors with 
results as shown in Table III.^ It can be readily seen that the Wherry correc- 
tion formula gave much better results than did the uncorrected Brown-Spearman 
formula when measured by the Rietz criteria. 

All of the results in the first three tables are based upon the means of the 
results obtained by Remmers, since this was the method used in his paper. 
However, when the number of cases is small, as they were in this study, it is 

TABLE V 


Error in Predicting Reliabtlity (Based upon Observed Medians) 


Error 

Brown-Spearman 

Wherry 

over . 210 

1 

1 

.151- .210 

2 


.091- .150 

3 

2 

,031- .090 

5 

1 

-.029- .030 

6 

5 

-.089- - .030 


5 

-.149- - .090 


1 

- 209- - . 150 


1 

below — . 209 


1 


TABLE VI 


Rietz Criteria of Normality Applied to Results from Medians 


Criterion 

Normal 

Brown-Spearman 

Wherry 

Ui 

0 

.074 

-.018 

ft 

0 

.497 

-.081 

ft 

3 

1.699 

2.284 


sometimes preferable to use the median rather than the mean as a basis of calcu- 
lation, since the median is less affected by extreme cases. The writer has there- 
,fore recalculated the problem on the basis of the medians discovered by Rem- 
mers, and the results are given in Tables TV, V, and VI. The results were found 
to differ but little from those based upon the means of the distributions. 

If we now assume that the formula (3) has been empirically established and 
justified, we must still answer a very practical question, namely, “How long 
shall we make our tests in order to achieve the greatest reliability?” To answer 
this question we must find the point at which R becomes a maximum, with 
respect to changes in M, assuming rn and N to be constani; terms. To find this 
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point we must find the derivative of equation (3) with respect to M and set the 
numerator equal to zero, thus, if we write Formula (3) in a slightly more usable 
form, we have, 

^2 iN - M~l 

iN - M)(l + 2[M - l]ru + [M~ N-M' 

whence 

+ Ih,) - (2Nrl + 3r J1 - rJ)M + (1 - rj^} 

dM (N - M)\l - 2[M - l]ru + [M - HV?!)" ' 

which causes B to reach a maximum or minimum when the numerator is placed 

TABLE VII 

Showing the value of M which will give a maximum value for R 
(According to the Brown-Spearman- Wherry-Smith formula) ■ 


rii 


xV 

.10 

20 

.30 

.40 

.50 

.60 

.70 

.80 


■1 

Imag. 

Imag. 

3 

4 

4 

4 

5 

5 

5 


Imag. 

6 

8 

9 

9 


Bl 

10 

10 


Imag. 

12 

13 

14 

14 

14 

mm 

16 

15 


11 

17 

18 

19 

19 

19 

20 

20 

20 


17 

22 

23 

24 

24 

24 

26 

25 

25 

■I 

22 

27 


29 

29 

29 

30 

30 

30 

WBm 

27 

32 

33 

34 

34 

34 

35 

35 

35 

Bl 

32 

37 

38 

39 

39 

39 

40 

40 

40 

Bl 

38 

42 

43 

44 

44 

44 

45 

45 

45 

m 

43 

47 

48 

49 

49 

49 

50 

60 

60 


equal to zero. Thus, placing the numerator equal to zero and factoring this 
equation, we find its roots to be 

M = (5a) 

Til 

M - - 3(1 - ^i) - - 12Vr^(l - rn) - 7(1 

8ru 

or 

_ 2Nrn ~ 3(1 ~ r J -h VWr\, - 12Vr,,(l - r„) - 7(1 - r (g^) 

8ra 

and by substituting actual values of N and ru in the equations, we find that 
equation (6c) is the root we are seeking (i.e.) the value of M for which R be- 
comes a maximum. 
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It can also ba readily seen that the value under the radical approximates a 
perfect square (lacking 16 units of being that figure) of the quantity outside of 
the radical, thus approximating this value for large values of N. Thus, when N 
is large (exceeds 100) we may secure satisfactory approximations to M if we 
rewrite equation (6c) in the form below 


M 


(Approiimately) — 


K 

2 


3(1 - rn) 

4rn 


(5d) 


Table VII shows the results of equation (5c) for values between V = 10 and 
JV = 100 (by increments of 10) for values of rn from .10 to .90 (by increments of 
.10). The use of the formula does not yield integers, and so the results in the 
table are recorded to the nearest whole number rather than exactly as given by 
the formula. 

If, in order to test the validity of formula (5c), we apply it to the values in 
Tables I and IV, we find fairly close agreement, The formula in each case pre- 
dicts a maximum value for R when M lies between 15 and 20, and in the actually 
lengthened tests R is found to be a maximum when M is 30, 15, 15, 20, 30, 15, 20, 
and 20, thus being in agreement six times out of eight. 


Conclusions 

1. The Brown-Spearman formula appears to give results which contain both 
constant and chance errors. 

2. These results can be practically eliminated by applying the Wherry-Smith 
correction formula to the results obtained by the Brown-Spearman formula. 

3. We may find the value of M which will give the greatest value of R by 
substitution in equation (6c) above, and then by substitution of this value m 
equation (3) , find the most probable value of R at its maximum point, 

4. For large values of N we may secure satisfactory approximations to M by 
means of the simpler formula (5d). 
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THE LIKELIHOOD TEST OF INDEPENDENCE IN CONTINGENCY 

TABLES 

By S. S. Wilks 


J, Neyman and E. S. Pearson^ have applied the principle of the ratio of likeli- 
hoods to the problem of determining criteria for testing various hypotheses about 
the group frequencies in problems dealing with grouped data. In particular, 
they have discussed the fundamental problem, the test of goodness of fit, the 
hypothesis that two samples of grouped data are from the same population, 
and the hypothesis of independence in contingency tables. In their treatment 
of these problems, these authors have started from the limiting form of the 
probability of an observed set of frequencies and have shown that approximately 
each of the appropriate X’s is a function of the minimum value of a corresponding 
xl . The distribution of this minimum value is found, from which the significance 
test is made. 

In certain cases the exact values of the X’s are relatively simple functions of 
the observations which can be as conveniently calculated as the correspond- 
ing x*’s. The purpose of this note is to consider the exact expressions for the X's 
and find their asymptotic distributions in large samples for the following 
hypotheses: (1) that a sample of grouped data is from a population with 
specified group frequencies (i.e., the fundamental x^ problem) ,(2) that several 
samples of grouped data are from the same population, and (3) that there is 
independence in a contingency table. 


1. The fundamental x^ problem. Let pi, p*, • • - p*, be the probabilities of the 
mutually exclusive events Ei, E 2 , ••• Ek respectively. In a sample of N events 
the probability that Ei, E^, • • ■ Ek will occur ni, iiz, • • • n*, times respectively, 
is given by 


( 1 ) 


(7 = 


N\ 


nilriil • • • n*I 






If we let 12 be the class of all sets of values of the p’s such that their sum is 
unity, there is only one set of p’s that maximize C, namely, p/ = n,/N {j = 1, 2, 
. . ‘ h). The maximum of C is 


( 2 ) 


(7(12 max) ==: 


N\ 




nilTijI ••• n*! 


n? 




I Biometrika, voL 20A (1928), pp. 263-294. 
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The likelihood of the hypothesis that the sample is from a population speci- 
fied by p’s having the values pi, pj, ... p* is defined as 


(3) 


(7(flniiax) 


/ iypiV’ / Np^Y> / Np^Y’^ 
\ni / \ni ) \nh ) ' 


X, is a quantity which clearly lies between 0 and 1. It will be 1 only when 
Pi = nj/N {j = 1, 2, • • - h), (that is, when the hypothesis is rigorously sup- 
ported by the sample) and tends to 0 as the sample values n,/N diverge more 
and more from the hypothetical values p,. The problem of making an exact 
test of significance of an observed value of X, would involve the computation 
of all terms of form (1) the n’s of which make X. less than the observed value of 
X,. This, of course, is impracticable except perhaps for the binomial case with 
small values of N. However, if the n’s are large we can find an approximate 

Tt — Nt) j ' 

solution. If we let a:,- = — — then except for terms of order l/V^ and 

VN 

higher, the x’b are distributed according to the law 


w — P. } 

V(27r)*-ipiPj ... p* ’ 

where XjX, = 0, Neglecting terms of order l/vOV and higher we easily find 

J.2 

(using natural logarithms) —2 log X, = 22 Therefore, if 0 = —2 log X„ 

J ^Pi 

e is approximately distributed according to the function 


(5) 


k—l k-Z 



which is the distribution with fc — 1 degrees of freedom. 

Since we have neglected terms of order 1/VN in obtaining (4) there is no 
theoretical reason why should be used in preference to —2 log Xi as the cri- 
terion for testing the hypothesis that the sample is from a population specified 
by pi, p 2 , ... p*. Any practical advantage which —2 log may have will 
therefore justify its use. 


2. The hypothesis that several samples of grouped data are from a common 
population. Let pa, Pti, • - ■ P<. be the probabilities with which the mutually 
exclusive events Ea, Ea, E„ occur, where Syp,-, = 1 (i = 1, 2, • - ■ r). Then 
in a sample of Ni events the chance that Ea, Ea, • - - Ea will occur ria, n<i, - - • n<, 
times respectively is given by an expression similar to (1). The chance of the 
joint occurrence of the r samples is 


Ni! Nil 

nul nial 


Nrl 


nr. 


-,Pu"Pu" 


p;/*. 


( 6 ) 
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We are interested in testing the hypothesis that the r samples are from the 
same population, that is, that the r sets of p’s p.i, p.z, • • • {i ~ 1, 2, ... r) 
are the same. The likelihood criterion Xc appropriate to this hypothesis is the 
ratio of the maximum (w(max)) of (6) subject to the condition that the sets of 
p’s are the same (that is, p„- = p,- say, f = 1, 2, . . . r; j = 1, 2, . . • s) to the max- 
imum (fi (max)) of (6) without this restriction. 

For convenience let the observations be arranged in table form so that a,,, is 
the frequency in the f-th row and j-th column. Let n,. and n., be the totals of 
the f-th row and j-th column respectively, and N the total of all observations. 
Thus w, . is the same asN,. The expression for Xc will be 


( 7 ) 


X. = 




n n' *• n 


2 • 


N’* ni"j“ 


It can he shown analytically that X, lies between 0 and 1. It can be 1 only 

when ^ = • ■ • = i = 1) 2, . • • s, that is, when the hypothesis of a 

Nl Nt Nr 

common population is perfectly substantiated by the samples. Because of the 
fact that the «„• are integers, it is clear that Xo can be 1 only in exceptional 
cases, but it can take on values arbitrarily near 1 for sufficiently large values of 
the Ui,. 


If the N{ are large, the quantities = 
tributed according to the function 


n,i - NtPj 

VWf 


are approximately dis* 


( 8 ) 


F 


= ( ! ^ 

\(2Tr)' ^piP 2 • pj 


r. -i S 

* e ^ 


lii 


where Ji,x„ = 0, f = 1, 2, 
higher, we find that 


r. 


Vi-- - V,J 

By neglecting terms of order l/\/JV and 


(9) 


-2 log Xo = (iVic., - Vat. (E< VNi a;.,)) 

° ^ Pj 


Denoting the quantity on the right side of (9) by xo it follows by straightforward 

analysis that the characteristic function <p{t) of xo defined by the r(s — l)-tuple 
/•« 

... / 

•go to 


integral 

( 10 ) 


» F dxi 


dxr, has the value 


(^) * (i-fO' * 


But it is well known that (10) is the characteristic function of any quantity dis- 
tributed according to (6) with {h — 1) replaced by (r — l)(s — 1). This, of 
course, is the x“ distribution with (r — 1) (s — 1) degrees of freedom. 

It will be noticed that the exact value of Xc is a function of the observations 
n.; which is independent of the p’s, while the approximate value of —2 log Xe 



INDEPENDENCE IN CONTINGENCY TABLES 


193 


as given by (9) involves the p’s. Before (9) could be used practically, one would 
have to replace the p’s by sample estimates, thus making further approximations 
necessary m order to get the distribution. If the usual estimates p, == n.,/iV 
are used for the p’s in xl we find that xo reduces to 


( 11 ) 


S 



ni.n., 

~ir 


which is the familiar x* function for testing independence in contingency tables. 
However, (11) differs from xo by terms of the same order (i.e., as those 

by which Xo differs from —2 log Xo. Since we have neglected terms of the same 
order in obtaining (8), there is no theoretical reason why (11) should be used 
rather than — 2 log Xo for testing the hypothesis that the m samples are from a 
common population. 


3. The hypothesis of independence in contingency tables. We shall con- 
sider a sample of N observations which can be arranged in a two-way contin- 
gency table havmg r rows and s columns. Let pa be the probability that an 
observation will fall in the f-th row and j-th column. The probability that the 
sample of N items will be distributed so that will be the number falling in 
the t-th row and j'-th column (i — 1, 2, • ■ • r; j = 1, 2, • • • s) is given by 


(12) 


N\ 


nul n.12! • • • Wr.l 


pViPit 


Pr. 


Here we are interested in testing the hypothesis that the classification by rows 
is independent of the classification by columns, that is, that pij is of the form 
p.g'/ where 

(13) S.p.= l, 2,g, = 1. 

For this hypothesis the appropriate likelihood criterion, say X^ , is the ratio of 
the maximum (co(max)) of (12) when Pa = p^qi restricted by the conditions 
(13) to the maximum (Q (max)) of (12) subject only to the condition that 
p,/ = 1. Xj turns out to be identical with Xe in (7). When the hypothesis 
'•i _ , 

of independence is true, the approximate distribution of the quantity — 2 log X* 
is the same as that of —2 log K when the hypothesis of a common population 
is true. To show that the distributions are the same we note that by placing 


(14) 


Vn 


we find from (12) that the xa are approximately distributed according to the 
function 
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_ L y hi 
5 


Pr)* (gm 


where ^ Xi, = 0. To the same degree of approximation we find 

t,3 

(16) - 2 log = y) ^ 

TTf 

Now the characteristic function of xo * can be shown without much diflBiculty to 
be identical with that of xo as given by (10). The identity of the characteristic 
functions of Xo’’ and xo implies the identity of the asymptotic distributions of 
— 2 log X' and —2 log X„. The problem of testing the hypothesis of a common 
population in several samples of grouped data is mathematically equivalent to 
that of testing the hypothesis of independence in contingency tables. 

"tt 71/ 

If the usual estimates Pi = Qi — are used in (16) we find that xo 

becomes the expression given by (11). But (11) differs from xo“ by terms of 
order I/Vn and higher. Therefore, — 2 log X^ and (11) can differ from each 
other only by terms of order I/x/N which is the order of approximation involved 
in getting (15) from (12). Thus, — 2 log Xj has as much vahdity as the usual 
criterion (11) for testing for independence in contingency tables. 

The Xe method can easily be extended to the case of contingency tables of 
higher order. For example, in a three-way table of r rows, s columns and i 
layers in which n,,* is the number of items observed in the t-th row, j-th column 
and fc-th layer, the Xj criterion for testing the hypothesis of independence, that 
is, that the probabilities p,/* are of the form puptipsh is such that 

—2 log X' = 2 2 log + 4 N log iV — 2 (n... log n, ) 

(17) •'”* 

— 2'^, (n.j. log n ,-.) — 2 (n..t log n..k) 

where «<.. = ]C ”**1 so on. —2 log X' in this case is approximately dis- 

1 k 

tributed like x” with rst — r— s — f-f-2 degrees of freedom. 

4. Illustrative examples. To illustrate the use of X, we shall consider the 
following example given by R. A. Fisher=“ dealing with de Winton and Bateson’s 
data on results of interbreeding the hybrid (Fi) generation of Primula in which 
two factors are considered. 



Flat Leaves 


Crimped Leaves 


Normal Lee’s 

Eye Eye 




* Statistical Methods for Research Workers, 4th ed. p. 84. 
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If the two factors are Mendelian, that is, segregate hidependently, the four 
classes of offspring resulting from interbreeding the Fi generation are expected 
to appear in the ratio 9 : 3 : 3 : 1 (assuming all classes equally viable) . We wish to 
test the hypothesis of a 9 ; 3 : 3 : 1 ratio. It is found that 

- 2 log. X, = 2 log. 10 Wf logic n, - 2 m logm (iVp.) J = 11.50 . 

Entering Fisher’s table for n = 3, we find that the chance of exceeding the 
value 11.50 is less than .01, which is significant if we take P = .06 as the critical 
level of significant deviation. Thus, the observed frequencies cannot be reason- 
ably explained as chance deviations from the 9 :3 :3 : 1 ratio. 

The usual x’* method gives x° = 10.87 and w = 3 for the 9:3:3: 1 hypothesis. 
The value of P in this case lies between .01 and .02. It follows from the theo- 
retical discussion that 10.87 has no greater validity than 11.50 in testing this 
hypothesis. 

We shall illustrate the use of X. by using another example given by Fisher 
dealing with Wachter’s data for back-crosses in mice. 



Black 

Self 

Black 

Piebald 

Brown 

Self 

Brown 

Piebald 

Total 

Coupling: 






Pi Males 

88 

82 

76 

60 

305 

Pi Females 

38 

34 

30 

21 

123 

Repulsion : 






Pi Males. 

115 

93 

80 

130 

418 

Pi Females 

96 

88 

95 

79 

358 

Total 

337 

297 

280 

290 

1204 


The back-crosses were made according as the male or female parents of the 
Pi generation were heterozygous in the two factors Black-Brown, Self-Piebald, 
and according to whether the two dominant genes came both from one parent 
(Coupling) or one from each parent (Repulsion). We wish to test the hypoth- 
esis that the proportions are independent of the matings used. We find 

—2 log X. = 2 log. 10 fX) Wi, logic n./ 

L«,i 

+ N logic N - n,. logic rii. - Yi ™ / logio « » J = 21.69 . 

Entering -Fisher’s x* table for n — 9 we find that the chance of exceeding this 
value is less than .01. The departure from the hypothesis of independence is 
significant on basis of the P = .06 level. The x* method gives the remarkably 
close result x® = 21.83, which, with n = 9 gives P < .01. 

6.. Summary. We have considered the exact expressions for the Neyman- 
Pearson X criteria appropriate to the following hypotheses: (1) That a sample 
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of grouped data is from a population with specified group proportions (the 
fundamental x“ problem), (2) that several samples of grouped data are from a 
common population, (3) that there is independence in a contingency table. The 
quantity —2 log X for each of these cases is approximately distributed like x®, 
the number of degrees of freedom bemg given in each case. It is shown that the 
usual X® method of testing these hypotheses has no greater theoretical validity 
than the X method. On the practical side, it is to be remarked that — 2 log X 
can be computed with fewer operations than x^- Two examples are given to 
illustrate the practical application of the X method. 

PbiNcbson University. 



THE PROBABILITY THAT THE MEAN OF A SECOND SAJMPLE WILL 
DIFFER FROM THE MEAN OF A FIRST SAMPLE BY LESS THAN 
A CERTAIN MULTIPLE OF THE STANDARD DEVIATION OF 
THE FIRST SAMPLE 

By G. a. Baker, Ph.D. 

The following statement of the significance of a probable error is often made: 
“The probable error of the mean is a value above and below the mean such that 
if the test were repeated under the same conditions there would be, on the 
average, equal chances that the mean would fall within or without this range.” 
The probable error is attached to the mean of the sample and it is assumed that 
the standard deviation of the sample is that of the sampled normal population. 
This was formerly a very usual explanation of the meaning of probable error by 
research workers, but it is inaccurate and misleading, especially for samples of 
20 or less such as are dealt with in agricultural experiments. The inaccuracy of 
this explanation of the meaning of probable error has been realized for many 
years by competent statisticians, but no satisfactory treatment has heretofore 
been devised.^ 

The attempted explanation of the probable error in terms of the expected 
frequency of the occurrence of different si?e deviations of the means of future 
samples from the sample mean does raise a very interesting, important, and 
legitimate question, namely, what is the probability of a second mean lying within 
a certain multiple of the standard deviation of a first sample of the mean of a 
first sample? This question is of fundamental concern to those engaged in 
experimental work. Its answer wiU indicate to investigators reasonable devia- 
tions from the results of- their first experiments, will form a valid basis for the 
rejection of doubtful observations or groups of such observations, and will form 
a basis for a test of the significance of the divergence of results in different' 
experiments. It is found that the usual method of treating the probable error 
gives an overly optimistic idea of the smallness of the deviations that may be 
expected in future samples. 

The distribution function of the variable 

X — z 

V 

V 

where x is the mean of the first sample, z is the mean of the second sample, 
and y is the standard deviation of the first sample, is obtained in this paper. 
The sampled population is assumed to be normal. 

‘ Camp, Burton H. “Suggested Problems for Mathematical Research,” Journal Amer- 
ican Statiaiical Aaeociation, Supplement Vol. 30, No. 189A, Mar. 1936, p. 259, No. 6. 
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Let the eampled population be represented by 

(1) /(«) = - «> g a: g 05 . 

V27r 

If a sample of n is drawn from (1) the means, as is well known, will be distributed 
as proportional to 

( 2 ) 

and the standard deviation will be distributed as proportional to 

(3) 0 g 2/ g 00. 

If a second sample of n is drawn from (1) its mean will be distributed as propor- 
tional to 

(4) — CO ^ 2 ^ 00. 

Consider the expression 


( 6 ) 


X — z 


V 


and call it v. Then v is the difference between the means of the two samples 
measured in terms of the standard deviation of the first sample. The distribu- 
tion function of v is sought. 

The three variables x, y, and z are independent. Let y, for the moment, have 
a constant value and write 


( 6 ) 


vy = X — z. 


The probability of a given value of ay in d(vy) for a given value of y is now bemg 
sought, that is, vy is regarded as constant. This probability is proportional to 


( 7 ) 


y n-"2 Jny* ^ — Jnn^y* 



g-iz+iw)' ^2 



from the application of the following 

Lemma. If x and y are independent variables, — <»ga:g«>,— oogygw, 
and the probability of an x in da: is f{x)dx and the probability of a y in dy is 
v(y)dy, then the probability of p = y — a; in dp is proportional to* 


f(.x)<p{v + x) dx dp. 


Thus the probability of a value of p in dp for a given y is proportional to 



'Baker, G. A. “Random Sampling from Non-Homogeneoua Populations,'’ Metron, 
Vol. 8, No. 3, Peb. 1930, p. 68 (slightly modified). 
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since d{vy) = ydv for y constant. Hence the total probability of a particular 
value of V in dv will be given as proportional to 



which is proportional to 


( 10 ) 



If the number in the first sample is and the number in the second sample is 
m, then (10) becomes 


( 11 ) 


dv 




This distribution, (11), permits an answer to be given to the question, what is 
the probability that the mean of a sample of a given size ra will differ from the 
mean of a first sample of size ni by as much as a constant multiple of the standard 
deviation of the first sample? Thus, this distribution gives a clear and compre- 
hensible indication of the expected conformity of future experiments and gives 
a valuable test for the significance of the difference between two means. If it is 
desired to use this distribution as a rejection criterion, n\ should be taken so as 
to include as many items as possible and so as to exclude the doubtful ones. 
The doubtful items should be included in the second sample. If the original 
sample is broken up into two or more samples it must be done in such a way as 
not to destroy the randomness of the resulting parts. 

Example. Suppose for the purpose of illustration that a sample of four is to 
be considered. The proper u-distribution is 


•\/2 dv 



The value of v which is necessary to give a probability of one-half is a root of 


tan"^ 


V2 


I 1 V2P 

2 p* + 2 


TT 

4 


which is .9. That is, an interval of 1.8 times the standard deviation of the 
sample of four with center at the mean of the sample is necessary for a proba- 
bility of one-half that the mean of the next sample of four will lie in this interval. 
This compares with .76 times the standard deviation of the sample if 


g 
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is used as the probable error of the mean and with .65 times the standard devia- 
tion of the sample if 


(T 



is used as the probable error of the mean. The last two methods of calculating a 
probable error with the interpretation indicated at the beginning of this paper 
give the investigator an unwarranted feeling of assurance about the agreement 
of future samples with a first sample. 

If two samples of ni and na are drawn from the normal population, (1), then 
these samples can be combined for the purpose of calculating a standard devia- 
tion and the difference between the means of the samples can be measured in 
terms of the standard deviation of the combined sample. The distribution 
function of the difference of the means divided by the standard deviation of the 
combined sample is 


(110 


dv 


1 -f- 


rai7i2 


(ni -f- ni)2 




l +ni 

2 


This distribution, (llO? is the basis for a valid test for the significance of the 
difference between two means. If either this test or the test based on distribu- 
tion (11) shows a significant difference between the means it can not be ignored. 
“Student’s" ^-distribution is proportional to 


( 12 ) 


di 




The above distributions can be easily transformed into <-distributions so that 
“Student’s” tables can be used. For instance, if we put 


V = 


V2 < 
Vi 


n 


N = n, 


then (10) becomes proportional to (12). Again, put 

0 = + nit 

Vn2 V »i~— i ’ 

and (11) becomes proportional to (12). Finally, put 


^ (wi Ui) t 

V Vwi -f na — 1 
and (11') becomes proportional to (12). 


N = ni, 


N = ni + rh, 


Summary. The distributions found for the difference of the means of two 
samples in terms of a standard deviation of one sample or combination of both 
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samples are similar to and easily transformed into “Student’s” J-distribution so 
that his tables can be used. However, these distributions answer a practical, 
interesting, and important question that “Student’s” f-distribution does not. 
If in an experimental science a series of observations is made it is desirable to 
know how much a similar series of observations could be expected to differ from 
the set of observations now available. This deviation, if it is to mean ansrthing, 
must be expressed in terms of quantities available from the observations already 
made. This paper gives the probability function of a deviation in the mean of 
a future sample measured from the mean of a first sample and measured in terms 
of the standard deviation of a first sample, that is, in terms of quantities known 
from the first sample. It is a very definite advantage and a great gam in assur- 
ance to know the point from which measurements are being made and the unit 
in which they are expressed instead of making vague, ill-defined assumptions 
about the zero point and unit length of the measuring scale. It is true that 
differences that were formerly considered significant may not be so considered 
now. But these differences would appear insignificant if experiments were 
sufficiently repeated, so that the net result is fewer inconsistencies to explain 
away. 



ON SAMPLES FROM A MULTIVARIATE NORMAL POPULATION' 

By Solomon Kullback 

1. Introduction. In this paper we shall discuss the distribution of certain 
functions calculated for samples drawn from a multivariate normal population. 
The method of solution is based on the theory of characteristic functions and 
presents further application of that theory to the distribution problem of 
statistics.^ 

We shall have occasion to refer to the multivariate normal population whose 
distribution law is given by 

(1.1) Fix) s 1 Bpg ^-"•5 (p, ^ = 1, 2, • • . , n) 

where B{x — m, x - m) is the real, positive definite quadratic form of the 
Xp — mp with matrix || Bpj |1. Here nip is the mean in the population of the pth 
variate and Bpq = Apq/2apaqA where Op is the standard deviation in the popu- 
lation of the pth variate ; A is the determinant of population correlations « = pqp ; 
Apg is the co-factor of pj, in A ; and | Bpq [ is the determinant of the matrix 1 1 Rps 1 1 . 

Since the integral of (1.1) over the entire field of variation of the variables is 
unity, we have (using abbreviated notation) 

(1.2) j dx = 7r«/2 1 Bpq 

Equation (1.2) will be true if \\Bpq (j is complex, provided its real part is sym- 
metric and positive definite.® 

The distribution of sample means of samples from the population (1.1) is 
independent of the distribution of the system of sample variances and covariances 
and is given by^ 

(1.3) Fi{x) s T""/* I Apq 

where A (» ■~m,x — m)is the real, positive definite quadratic form of the Xp — rrip 

AT 

with matrix l|A„ 3 l| Here Xp = (l/AT) X) Xpa is the sample mean of the pth 


‘ Presented to the American Mathematical Society, February 23, 1935. 

®For more complete reference to the theory of characteristic functions as applied to 
statistics see S. Kullback, Annals of Mathematical Statistics, Vol. 5 (1934), pp 263-307. 

’ J. Wishart andM. S. Bartlett, Proc. Cambridge Phil Soc., Vol 29 (1933), pp. 260 ff. 

* J Wishart, Biometrika, Vol. 20 A (19'28), pp. 32-52. 

J. Wishart and M. S Bartlett, loc. cit. 
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variate, and Apo — NBpg, where Sp* has been defined for equation (1.1). The 
distribution law of the system of sample variances and covariances is given by® 

(1.4) Fi{a) = ! — |(w-n-2)/2 

^n(n-l)/4 nr(Ar-r)/2 
r-1 

« If 

where 4(a) = aj>a and ap, = = (1/iV) (sp. - Xp) (x,c ~ xj 

».«■=! a-1 

with Apg and Xp defined as for (1.3). .Since the integral of (1.4) over the entire 
field of variation of the is unity, we have® 

(1.6) f 1 ap, da == j Ap, fl r(i\f - r)/2 

J r-1 

Equation (1.5) will also hold if the matrix H Ap, H is complex, provided its real 
part is ssmimetric and positive definite.'' 

2. Variance. Consider a sample of N independent items from the normal 
population (1.1). Let 

( 2 . 1 ) V = i Opt 

p,g—i 

where Up, is defined as in (1.4). From the theory of characteristic functions 
and (1.5), we have that the characteristic function of the distribution law of v 
is given by® 

(2.2) <p{t) = I F 2 (a) da = | Ap, 1 Ap* - it . 

It may be readily shown that 

n 

(2.3) 1 Ap, - 1 = I Apg I - it X; Aw 

P.8”! 

where A'’® is the co-factor of Apg in ] A,,! , 

We thus have for the distribution law* of v 

(2.4) P{v) = (A /c) ^ / e-‘‘’’(A/c - 

J— 00 


® J. Wishart, loc. cit. 

» Cf S. S. Wilks, Biomeirika, Vol. 24 (1932), pp. 471^94. 

1 A. E. Ingham, Proc. Cambridge Phil, Soc., Vol. 29 (1933), p. 271 fi. The considerations 
in this paper will still hold if the condition above is imposed. 

* S. Kullback, loc. eit., p 272. 
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where A = | ^4^ 1, c = X) and A/c > 0 since 1| Apj 1| is positive definite. 

p,a»»i 

By using the fact that® 

U’‘-^/T{k) , > 0 

10, 


(2.5) 


j:- 

2« ja_,„ 


c*** 2 “* * ’ dz = 


where fc > 0, a > 0, we have 

(2.6) 


PC, A — f,-uh)v 


3. Ratio of variances. If vi and «2 represent the statistic v (defined in 

(2.1) ), obtained from independent samples of JVi and iVa items respectively, then 
it may be shown that the distribution law of lo = ui/vj is given by'® 

(Q 11 PC,, A _ r(jVi 4- 2j/ 2 ^ u,t«,-3)/2Ci 4- 

(3.1) rtiD) - r(J^a - l)/2 ^ 

If we set w = ni/na, where ni = iVi — 1 and nz == iVa — 1 we obtain for the 
distribution law of z" 

(3.2) P(z) = 2 (n, + . 

4. Student’s distribution. Consider a sample of N independent items from 
the normal population (1.1). Let 

n 

(4.1) = X) (xp - tHp) (x, - mg) 

p,q—i 

where Xp and mp are defined as in (1.3). The characteristic function of the 
simultaneous distribution function of ju, defined as in (4.1) and v defined as in 

(2.1) is given by 

¥>(<1, < 2 )= / expiiCi ]X (xp - mp){xg ~ mg) + itq S “psC 

(4.2) J { P,a-i p.«-i ) 

Fi{£)Fi{a) dxda 


• Cf. A. E. Ingham, loc. cit. 

J. Wishart and M. S. Bartlett, Proc. Cambridge Phil. Soc., Vol. 28 (1932), p. 456 ff. 

S. Kullback, note accepted for publication soon in the Annals of Math. Statistics. 
u Cf. B.. A. Fisher, I. Proc. International Math. Congress, Toronto (1924), Vol, 2, pp 805- 
813. 

R A. Pifiher, II. Statistical Methods for Research Workers, 4th Edition (1932), Edinburgh: 
Oliver and Boyd, pp. 224-227. 
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where Fi and Fi are defined as in (1.3) and (1.4) respectively. From (1.2) and 
(1.5) we have that 


(4.3) 


h) = (A/c)^/“(A/c - itx)-‘«(A/c - 


where A and c are defined as in (2.4). The simultaneous distribution of n and 
V is given by 


(4.4) 


P(m, v ) = ( 1 / 2 . 


■>■/:/: 


e“''‘'‘~*'*V(fi, ti) dtxdh 


which evaluated by a procedure similar to that used for (2.4) yields 

(A/c)^« 


(4.5) 




* /i 1/2 ^***M^/® — vAfe 


r{N- i)/ 2 ri/ 2 ' 

From (4.5) we may readily obtain the distribution of z = to be“ 


(4.6) 


VN/2 


2 m - iy/2Tl/2 ' (0 g z S «.) . 


6. k samples. Suppose we have k independent samples of Ni, N 2 , • •• , Nk 
items respectively, drawn fron the normal population defined by (1.1). Let 
Hr, (r = 1, 2, • • • , fc) be the statistic /x, defined by (4.1), for each of the k sam- 
ples respectively; let Vr, (r = 1, 2, • • • , fc) be the statistic 7, defined by (2.1), 
for each of the k samples respectively; let mo and To be the values of these sta- 
tistics for the sample ofJV ==JVi-|-iVs + ••• + Nk items obtained by pooling 
all the samples. 

It may be readily verified that 

(5.1) MO = Z MrNl/m + 2 E (« /3) 

r=s 1 a, 

(5.2) NmO +NV^= E (NrMr + NrVr) 

r~l 

(5.3) NVo = E iNrVr + Mrflr) - 2 E I^V^NaN^/N (ct ^ fi) 

r“l o,P = l 

where Mr = (NNr - Nl)/N. 

In view of (2.6) apd (4.5), it is evident that the simultaneous distribution 
law of Mr, Vr, (r = 1, 2, • • • , fc) is given by 

(5.4) Pin) • Qiv) s n P(.Mr; Nr) QiVr; Nr) 

r-l 


Cf. "Student,” Biometrika, Vol. 6 (1908-09), pp. 1-26. 

R. A. Fisher, Metron, Vol. 6 (1925), pp. 90-104. 

P. R. Rider, Annals of Mathematics, 2nd 8., Vol. 31 (1930), pp. 679-582. 
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where 

(5.5) PilXn Nr) = ^ (B/D)^I^ ^7'/^ e~^r.r BIO 

(5.6) QiVr', m) = |r(7r.-3)/Z 


and B is the determinant ] Bpq j defined in (1.1) and X> = 22 where is 

p, ([“■i 

the co-factor of Bpq in | Bpq [. 

Using (5.3) and (5.4), we find that the characteristic function of the simul- 
taneous distribution law of v’r = Fr 5/D, (r = 0, 1, • • • , i) is given by 

(5.7) v(io, h, • • • , fO = J - -'k) P{u ) . Q(i?)d«dt) 

where 


t7(«,) = (B tVi>) I E MN -2 ni^^ixy^NMm] , 

l.r-1 «, S“1 J 

and 

F(to, h, • • • , to = (B/D) 1 2 T^r{ftr + fto 
Let MrB/D = f r and VrB/D = »)„ (r = 1, 2, • . • , fc) and rewrite (5.7) as 


the product of fc -H 1 integrals 



(8.8) 

^(fo> h, • • • , tk) = Jo/i • 

• I* 


where 




(6.9) 

r _ {NiN2 . • ■ Nk)^i^ f _ 

" r(W j ‘ 

-r(f, n 

dt 

with 





k 

Tit, f) = E tliNr - ito Mr/N) + 2 fto 

k 

E 

totfiNaNfi/N^ , {ot?^0) 


r = l 

n, 

and 





iy(JVr-l)/2 /•« 

(5-10) Ir = exp {- nr (i\r. - it^r/N - ftOl dl»r . 
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By employing (1.2) we find that 


Ni-ikMi/N UoNiN^/m 


( 6 . 11 ) h = iN,N , . . . N,y‘^ 


ita N^i/N^ Ni - ik Mi/N . . . 


ikNiNk/N^ 


-i/j 


ik NJfk/m 


I ik NkNi/m ito NkNi/m ■ ■ . JVt - ik Mt/N | 

The determinant may be readily evaluated by removing the common factor Nr 
from the rth row (remembering the value of Mr as given in (5.3)) and applying 
the operations!^ (row 1 - row 2), (row 2 — row 3), • • • , and then column fc + 
column 1 + column 2 + • • • + column k ~ 1. We thus obtain 


(5.12) 


/o = (1 - 


The integral in (5.10) is well-known and yields 

(6.13) Ir = (Nr - ikNr/N - 

There thus results 


(6.14) <fi(k, <1, • • ■ , tk) = 0(1 - n (Na ~ ikN^/N - 


a«l 


where (7 = H . 

a=» 1 

The simultaneous distribution law of <pr, (r = 0, 1, • • • , it) is given by 

0 


P(<p0l Vli > Vk) = 


(2t)*’+i 


(5.16) 


er'‘o n-^'i fi- • ■ -’** dto d<i • • • dtk 


(1 - iU/NY'‘~^'>i^ n (^« - ikNaIN - 

0=1 


Integrating successively with respect to t*, tk-i, ,ti and applying (2.5) we have 


P(<Po, <pi, ■ ■ ‘ , Vk) = 0 exp I — ^ n 


<pa 


,(y„-j)/2 i 

t* ^ 


(6.16) 


ami 


\ T(Na - l)/2 2 t 

/ATI J'f* \ 

(1 - ito/NY^-'^'* 


“ Of. A. C. Aitken, Quarterly Journal Math., Vol. 2 (1931), pp. 130-135. 
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and finally 

P{<Po> ¥>!,•••, n) = 

(5.17) (^0 _ N,<pi/N IT 

r(fc - 1)/2 1 i vm 7 :rTj 2 ) ■ 

If we apply to (5.17) the transformation 

|(tf0 = (Po 

(5.18) 

[<Pt = N^r<po/Nr ir = 1,2, . . . ,h) 

and integrate out <pa, we obtain for the simultaneous distribution law of fr = 

NrVr/Nv, = NrVr/NVo 


Dih, fa, • • • , ft) 

(5.19) 


r(iV-l )/2 
r(A: - l )/2 “ 


fa ft)'*'>5/» 


n 




T(Na ~ l )/2 


where the limits of variation in (5.19) are*^ 


( 5 . 20 ) 


|0 S fi g 1 

[O g fr ^ 1 - fx-fa fr-l, 


(r — 2 , 3 , • ‘ , ifc) 


6. Correlation ratio. Let f = log (1 - fi - fa - • • • - ft) where the 
fr, (r = 1, 2, ■ ■ • , fc) are defined and distributed as in (5.19). The character- 
istic function of the distribution law of f is given by 


( 6 . 1 ) 


<p{t) = 


m - 1)/2 f 
r(ft - i )/2 J 


fi-fa 


n 


j.(Ara-3)/2 


T{Na - l )/2 


dfa 


where the limits of variation are given by (5.20). The integral in (6.1) is readily 
evaluated as a Dirichlet integral, and we obtain 


( 6 . 2 ) 


M - r(iV-i)/ 2 r(fc-i -b 2 tt )/2 
~ Tik- l )/2 r( 2 V - 1 -t- 2 it )/2 • 


“ Cf. J. Neyman and E. S. Pearson, I. Bullelin de VAcadimie Polonaise des Sciences et 
des Leilres, StrieA, Sciences Mathimaliguea, 1931, pp. 460-481. 

E. Goursat-E. E Hedrick, Mathematical Analysis, Vol. I (1904) (Ginn and Co., N. Y.), 
p. 308. 
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The distribution law of f is given by 


(6.3) 


p ( r ) = 


T{k - l)/2 271 fW^:^T+Mj2 


Now it may be shown that“ 


(6.4) 


1 r »-./r r(fc- 1 + 2it)/2 

271 j-oo r(JV ~ 1 + 2it)/2 r(iV - A:)/2 


so that 


(6.5) 


p ( r ) 


r(iv-i)/2 

r(fc- l)/2 r(JV - A;)/2 


gr(t-l>/2(l _ gf){JV-i-2)/2_ 


If we set ef = tj®, then we obtain for the distribution^'^ of rj'^ 


( 6 . 6 ) 


= 


V(N - l)/2 


r(A: - l)/2 r(fV - /b)/2 




Trom its definition we have that 


(6.7) 7,' = (NVo - N^Vi - . . - NkVk)/NVo 

which reduces to 


( 6 . 8 ) 7 ,^ = {NtW, + N2Wi + . . . + NkWk)/NVo 

n 

where Wa = 2^ (^ p« — ^ po) (^ ga — ^go) with Xpa the sample mean of the pth 

p,q~l 

variate in the ath sample and Xpo the sample mean of the pth variate in the 
sample formed by pooling all the samples.^ 

In a similar manner, we have that the distribution law of = fa, 
(« = 1, 2, . . . , fc) is given by 


(6.9) Dir,l) = 


riN - 1/2) 

TiN«-l)/2T(N-Na/2) 


(^S)M/2(-i _^2)(37-V„-2)/2_ 


It may be of interest to point out another derivation for the distribution of 
= 1 _ Let 

_ [0 = (B/DmtVi + iNT^Fg + . . . + NkVk) 

( 6.10 

0„ = (B/D)NV, 


“ Whittaker and Watson, Modern Analyeia, 2nd Ed , pp. 283, 333. 
u Cf. R. A. Fisher, loc. cU., I. 

H. Hotelling, Proc National Academy of Sciences, Vol. XI (1925), pp. 657-662. 
Cf. S. S. Wilks, loc. cit., p. 482. 
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The characteristic function of the simultaneous distribution law of 6 and do is 
immediately derivable from (5.14) by replacing ta by Nto and U by Nrt 
(r = 1, 2, • • • , fc). There results 

(6.11) to) = (1 - f/o)~«-^''*(l - ito - . 


By a procedure similar to that already used we jSnd that the simultaneous distri- 
bution law of 6 and is given by 


( 6 . 12 ) 


ns, So) - . 


By applying to (6.12) the transformation S = Soh^, So = So and integrating out 
the value of do, we find for the distribution law of 


r(JV — l)/2 t2\(S-3)/2 

(6.13) - r(jv_fc)/2r(ifc-i)/2^'^^ u /I) 


From (6.12) and (6.10) it may be shown that the following estimates of variance 
all have the same expected value“ 

ATiYx + ATsF* + . • • + mV, 

N - k 


(6.14) 


NVo 
N - 1 

iViTFi -f- NoWo + •■■ -I- N,Wk 


7. Distribution of variances. Let 

(dr==NrVrB/D (r = 1, 2, • • • , ifc) 

(7.1) Joo = iVyoB/D 

[fl = (B/D) (ATiFi + mVo +••.-!- mVk) 

where the right members of (7.1) are defined as in section 5. It is evident that 
the characteristic function of the simultaneous distribution law of 0, do, Sr, 
(r = 1, 2, •••,& — 1) is derivable from (5.14) by replacing <q by NU, U by 
Nr{ir + 0) (»■ = 1, 2, • . . , fc — 1) and tk by Nkt. Thus 

<p(.t, to, ti, • • • , tk-i) = (1 - f<o)-»-«'* 

(7.2) k-i 

(1 _ ito - n (1 - ^'^0 - ita - . 


*• Cf. J. Neyman and E. S. Pearson, 11. Btometrika, Vol. 20A (1928), pp. 273-274. 
S. Kullback, Annols of Mathematical Statistics, Vol. 6 (1935), pp. 76-77. 
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By proceeding as in section 5 we arrive at the result that the simultaneous 
distribution law of 9, Bo, Or, (r = 1,2, ... ,h — 1) is given by 


t>/. . -01-02 

F(p,do,Or) — T- ^ 


(7.3) 


r(A: - l)/2 r(iVi - l)/2 


11 V(N 


i-Jj r(F„-i )/2 

where Oo S 0, 0 ^ 0i + 02 + • • • + 0fc_i. 

By integrating out the variable 0o from (7.3) we have for the simultaneous 
distribution law of 6, 6r, (r = 1,2, ... ,k — 1) 


(7.4) D(0,0.) = 


e-« (6 -Oi-di- 


V(N^ - l)/2 


n 

fl£*“ 1 


T(N. - l)/2 


A procedure similar to that used to derive (5.19) yields for the simultaneous 
distribution law of 


(7.5) 

(7.6) 


h = 0r/0 

Pih, h,-’-, h-i) = (1 - if-i - 


(r = 1, 2, . . . , fc - 1) 

\^*-i) 


ifc-1 

n 


^(H'„-8)/2 

r(A„ - 1)/2 


(7.7) 


where the limits of variation in (7.6) are^ 

|0 g ^1 ^ 1 

(0 g ^ 1 — ^1 — ^2 — • • • — yj/r-i , {r = 2, ... ,k — 1) . 

In a manner similar to the derivation of (6.6) we find the distribution law of 
ki = \pa, (oc — 1, 2, . . . , fc — 1), fl| = 1 — ^1 — ^2 — ' • • — ^i-1 to be 

r(Af-fc)/2 - 

(7.8) ~'nN„ - l)/2 r(Ar - A - iV„ + l)/2 

(ft2)(V«-8)/2 (1 __ /i 2)(V-*^V,^1>/2 ^ (a = 1, 2, . . . , A) . 

From the distribution law in (7.3) we readily obtain that the characteristic 
function of the distribution law of yl = log {Oaf (0o — 0) is given by 

(>7 O', (A r(Ar„ - 1 + 2it)/2 r(fc - 1 - 2ii)/2 ^ . 2 • • ■ k) 

(7.9) ^(f) = r(iv<. - l)/2 r(A - l)/2 ” ’ ’ ' ^ 


*• Cf. J. Neyman and E. S. Pearson, loe. dt., I. 
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We thus have that the distribution law of 7® is given by 


P(7^) = 


(7.10) 


r(JV„ - i )/2 r(fc - i )/2 2 ir 

J” r(N„ - 1 + 2it)/2 V(k-1- 2it)/2 dt . 

The integral in (7,10) is known,** and there results 

(7 11) P(y^) ~ r(iV« + k- 2)12 / r^ycva+fc-a/a 

U- ) iTal r(JV„ - l)/2r(fc- l)/2 y + e J 

If We set e “ = d^/iBa — 0) = we have for the distribution of X“ 

(7.12) D(x“) = r(Wa + k - 2)/2 Q 

^ ^ ^ r(i\r. - i)/ 2 r(A:-i)/ 2 ''^“^ u -h ^ J 


An extension of the procedure used to obtain (7.9) yields as the characteristic 
function of the simultaneous distribution of 7?, 72, ... ,7^ 


<pih, k, ' • • , k) 


(7.13) 


r(fc — 1 — 2f<i — 2fi2 — • 
r(fc - 1)/2 


■ -2f4)/2 


TT r(N„ - 1 + 2iQ/2 

U V{N. - l)/2 


Successive application of the method used to evaluate (7.10) yields as the simul- 
taneous distribution law of the 7^ 


PM, yl--, yl) = ~ (1 + 6*-^ + . . • + 

n r(iV«-i)/2- 

The simultaneous distribution of the X^ defined as in (7.12) is given by 

.^2 \2\_. r(Ar — 1)/2 , ^2 , ^2 , , ^ 2\-(y-n;2 

^2, ■ ■ ■ , \k) = r(;fc _ {y2 + ^2 + ' " + 7^*) 

(7.15) , 

Yj 

1^1 r(Ar„-i)/ 2 - 


” Whittaker and Watson, loc. cit., pp. 283, 383. 
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8. Conclusion. In this paper we have presented further instances of the 
applicability of the theory of characteristic functions to the distribution problem 
of statistics. In a subsequent paper the author hopes to illustrate the applica- 
tion of the results here developed to specific numerical problems. 

Washinoton. D. C. 



ON A CRITERION FOR THE REJECTION OF OBSERVATIONS AND 
THE DISTRIBUTION OF THE RATIO OF DEVIATION TO 
SAMPLE STANDARD DEVIATION 

By William R. Thompson 

Criteria for the rejection of outlying observations may be designed to reject a 
given fraction of all observations, or a proportion varying with the size of the 
sample. Irwin^ has discussed several criteria based on sampling from a normal 
population which had been used previously, as well as one which he proposed. 
This is based on the principal of fixing the expectation of rejecting an observation 
from a sample independently of the aggregate number, N, of the sample. The 
criterion, X, is l/a times the interval between successive observations in ascending 
order of magnitude, where a is the standard deviation of the sampled population. 
In the same paper he gave, for different values of N, a table of Pi(X) and 
respectively probabilities of exceeding given values of X for the first or second 
such interval from either end. In actual use, however, a is estimated from the 
sample standard deviation, and we are left to decide whether observations in 
question are to be included or not in estimating the standard deviation as also 
whether or not to modify this by addition or subtraction of an estimate of its 
probable error. The object of the present communication is to develop a 
criterion free from defects of this nature, depending only on the assumption of 
random sampling from a normal universe. For this purpose we develop the 
distribution of t defined by 


( 1 ) 


where s is the sample standard deviation and 6 is the deviation of an arbitrary 
observation of the sample from the sample mean. This leads to definite criteria, 
which are simple in application. 

Accordingly, consider a sample {*,}, i — 1, • • • , W, to be drawn at random 
from a normal population of unknown mean and standard deviation, and that 
the order of enumeration is arbitrary. Then aiy is an arbitrary one of the ele- 
ments or observations. Now, let 




( 3 ) 


3 — 
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N 


and 
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Then we will prove that the distribution of r s S/s in repeated sampling with a 
fixed aggregate number, N, is given by substitution of 

■\/n-z — t = Vn-r/Vw + 1 - T® 

in the s or t distribution of “Student” and E. A. Fisher,* where n = N - 2. 
To this end let N > 2, and let «, = iV — 2, and 

n+l »+! 

(4) {n + 1)^1 = E Si{x - xiY = {z, ~ xiY . 

i“i i**i 

Obviously, the (n + l)Si + Xy — N-x, whence 


(5) 


X — Xi 


Xy 


n 4- 1 n + 1 * 


whence 


71 “b 2 . 

Xy — Xx'= b— ■ 5 . 

n + 1 


Furthermore, JV-s* = Si{x — «i)* + (n + 1) (5i — $)* + {xy - «)*, whence 
(6) JV.s* = >Si(x - + 

n + 1 

Now, considering the separate samples, jx,), f = 1, • • • , iV" — 1, and {xy\, 
of aggregate number, N — 1 and 1, respectively; Fisher has shown* that if we 
set 


(7) 


t = 


{Xy — ^i)--s/n 

V )Si(x — fi)* 


i / w’ + 4 

V n + 2 


then, for <o > 0, the probability, p, that t < Utk 

. r( 
rl 


( 8 ) 


P=2 + 


( ^ ~t~ 4- \ _ „+i 

K2± /-Vi+^y ^ 

n\ / Jo \ nj 

Kl-Vn-ir 


it)- 


dt , 


and P = 2(1 — p) is the probability that It] > <o. 
Now, (5) and (6) in (7) give 

n + 2 


(9) t 


n + 1 


• "s/n 


whence 

(10) 


- 


+ 2 n \ — 


= sin , tan B = — ^ 


■ I \fx >. — — — OAu V 4 VMu w — *— — • m • 

n + i* ’ Vn + 1 Vn 

Accordingly, P is the probability that | t | > to ^ A/ . 

r n + ^0 

Thus, if we want to determine to so that by rejecting all observations deviat- 
ing from the sample mean by more than s-to we shall have an average relative 
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frequency of rejections per sample which is fixed, say <j>; then we need only 
to set P = (t>/N. This follows at once from the hypothesis as a: is a random 
element of the random sample of N elements drawn from the same normal 
universe (of unknown mean and standard deviation). The criterion of re- 
jection, s-To, is uniquely determined from the sample standard deviation and 


TABLE I 


AT 

T for given ^ 

t lot given * 

n 

^ - 0.2 

01 

0.0B 

0.2 

0.1 

O.OS 

3 

1.40646 

1.41228 

1.41373 

9.51 

19.08 

38.19 

1 

4 

1,6454 

1 6887 

1.7103 

4 30 

6.20 

8.84 

2 

5 

1.791 

1.869 

1.917 

3.48 

4.54 

6 84 

3 

6 

1,895 

1.997 

2.067 

3 19 

3.97 

4 84 

4 

7 

1.973 

2.093 

2 182 

3.04 

3.68 

4 38 

5 

8 

2.041 

2.170 

2.274 

2.97 

3.51 

4.12 

6 

0 

2.099 

2.237 

2.348 

2.93 

3.42 

3 94 

7 

10 

2,144 

2.295 

2.413 

2.89 

3.36 

3 83 

8 

11 

2.190 

2.343 

2,472 

2.88 

3,31 

3 76 

9 

12 

2.229 

2.388 

2 621 

2 87 

3,28 

3.70 

10 

13 

2.262 

2.425 

2 567 

2 86 

3.26 

3 66 

11 

14 

2.296 

2.463 

2.598 

2 86 

3 24 

3.60 

12 

15 

2.325 

2.497 

2 636 

2 86 

3 23 

3 58 

13 

16 

2 367 

2.622 

2.670 

2.87 

3 21 

3 56 

14 

17 

2.382 

2,653 

2 699 

2.87 

3.21 

3.54 

15 

18 

2 404 

2.676 

2 733 

2.87 

3 20 

3 54 

16 

19 

2.429 

2 601 

2 769 

2.88 

3.20 

3.53 

17 

20 

2.448 

2.625 

2 783 

2 88 

3 20 

3 52 

18 

21 

2 471 

2 647 

2 800 

2 89 

3 20 

3 50 

19 

22 

2.487 

2.661 

2.819 

2 89 

3.19 

3 49 

20 

32 

2.636 

2.819 

2 985 

2 944 

3 216 

3.479 

30 

42 

2.737 

2.925 

3 093 

2 991 

3 248 

3 489 

40 

102 

3 047 

3 233 

3 407 

3.182 

3 397 

3,603 

100 

202 

3 266 

3.448 

3.621 

3 347 

3 546 

3 736 

200 

602 

3 528 

3,704 

3.872 

3.569 

3 762 

3.927 

600 

1002 

3,714 

3.881 

4,047 

3 737 

3.908 

4.078 

1000 


P = 

Note: T is computed to 0.6 unit in the last place given from the given t which is believed 
correct to 1 unit in the last place. 


number of elements, N, for any prescribed <j>. Dropping the subscript, criti- 
cal values of t are given in Table I (together with corresponding values of t) 
for <f) = 0.2, 0.1, and 0.05 and values of n ^ N — 2 which should be suflficient 
for most practical purposes. The normal deviate (for unit standard deviation 
and the same P) lies between these values and is approached by t and t (in the 
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tabulated range of <^) from opposite sides as n increases, tbe approximation to r 
being the closer of the two. Accordingly Sheppard’s tables may be used with 
good approximation for n > 1000, with 4,/N = P, the probability of exceeding 
numerically the given deviate. They may be used to advantage also in inter- 
polation between n = 100, 1000 by means of differences at the tabulated 
points. 

A crude rejection system where we reject an observation if it deviate from the 
mean of all others by more than a fixed constant times the standard deviation of 
such a dif ference in term s of a as estimated from the variance of these others by 
^ §1 (x — xiy 


<r 


amounts to taking a fixed value of < as criterion. The 


N ~2 

intention is usually to fix the probability (P) of rejection of observations rather 
than the expectation of rejections per sample (</>); and this, of course, is the 
expected approximate result for large samples. For small samples, however, say 
4l < N < 32, by rejection of observations deviating thus by more than 

3 • ^ i ' from (7) and Table I that approximately 0 would 

be fixed rather than P. 

The r-criterion not only affords a precise extension of such a rejection system, 
but also a reduction of the actual process of application to a minimum, with one 
noteworthy exception for the case, iV = 3. Here we may use as criterion with 

Xi, and 

This order can always be adopted for the test, and it is readily verified 


identical effect the ratio, where a:i ^ X2 ^ *3, di = Xi — Xi, di = Xi 

di 

di ^ d\. 
that 


( 11 ) 




V3 •< - 1 


dz 


whence for <j> = 0.2, 0.1, and 0.05, respectively we have -j- ^ 7.74, 16.0, and 32.6. 

di 

Thus, for JV = 3, we may take merely the ratio of the greater to the other 
numerical deviation from the median observation as criterion. 


Section 2 

Although not required in connection with the rejection criterion developed 
above, there is a simple generalization of t with a closely related distribution 
which may be valuable in somewhat different circumstances. Consider the same 
situation as given above, except that {a:,} is divided into two subsets, where 
i == 1, ,N - k, and i = N - k + 1, N, respectively; giving two 
random samples of aggregate number, N — k and k. Let the means of these be 
«i and Xz, respectively; and s and * be as before. Then in general let 


( 12 ) 


S = Xz — X and r = 
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TABLE II 


’’( P . N . l ) 


If 

P =.0.9 

08 

0.7 

06 

os 

04 

U 

p 

02 

0.1 

0 06 

0 02 

0 01 

If 

3 

.221 

437 

643 

.332 

n 

1.144 

1 260 


1 3968 


1 41362 

Hi 

3 

4 

173 

347 

520 

693 

866 

lli^l 

1 212 

1.386 

1 569 

1 6080 

1 6974 

1.7147 

4 

5 

.188 


478 

639 

808 

.983 

1 170 

1 374 

1 611 

1 767 

1 869 

9176 

5 

6 

.149 


.483 


.777 

952 

1.143 

1.360 

1.031 

1 814 

1 973 


6 

7 

.144 

290 



.757 

.932 

1.128 

1 346 


1.848 

mm 

2.142 

7 

8 

.141 

.284 



744 

.918 

1.111 

1 340 

1. C 44 


2 087 


8 

« 

.139 




734 

KB 

1 102 

1 334 

1 647 

1 885 

2.121 

2 269 

0 

10 

137 

276 



.727 

.899 

1 094 

1.328 

1.648 

1 896 

2 146 

2.294 

10 

It 

136 

.274 

.416 

664 

.721 

.893 

1 083 

1,324 

1 643 

1 904 

2 166 

2.324 

11 

13 

,136 

.272 

413 

.660 

.717 

.888 



1 649 

IHriM 

2.183 


IB 

13 

.134 

.270 

.411 

667 

713 

.884 


1 317 

1 649 

1 915 

2 196 



14 

.134 

.269 

.408 


.710 

.881 


1.314 

1,646 

1 919 

2.207 



18 

.133 

268 

.407 

662 


.878 

1.073 

1.312 

1 649 

1.923 

2 216 

19 


16 

133 

267 

406 


708 

876 

1.071 

1.310 

1 649 

1.926 

2.224 

2 411 

16 

17 

.132 

.266 


648 

WiW 

.873 

1.069 

1 309 

1 649 

1 928 

2.231 

2 422 

17 

18 

.132 

.285 


.547 

.701 

871 


1.307 

1 649 

1.931 

2 237 

2.432 

18 

It 

.131 

.264 


.846 

.606 

.889 

1.095 

1 305 

1.649 

1.032 

2.242 


19 

30 

131 

264 


.644 

698 

.868 

1 063 

1 304 

1 649 

1.934 

2 247 

2 447 

20 

21 

.130 

.263 


.843 

.697 

.867 

1.062 

1 303 

1 649 

1.030 

2 261 

2 454 

21 

22 

130 

.263 

399 

642 

693 

.865 

1 061 

1.302 

1 649 

1 637 

2.265 

2 460 

22 

23 

.130 

.262 

.398 

.841 

.696 

864 

1 069 

1.301 

1.649 

1.938 

2 259 

2 466 

23 

24 

.130 

.262 

.398 

641 

.694 

■n 

1.068 

1.300 

1 649 

1 640 

2.262 

2.470 

24 

28 

.130 

.261 

397 


693 

.862 


1 299 

1.649 

1 641 

2 264 

2.475 

26 

28 

.130 

.261 

.897 

.839 

.692 

861 

1 066 

1 299 

1 648 

1 942 

2 267 

2.479 

26 

27 

129 

261 

.397 

838 

691 

mm 


1.298 

1 648 

1.942 

2 269 

2 483 

27 

28 

.129 

.261 

.396 

.838 

691 

860 


I 297 

1.648 

1.943 

2.272 

2 487 

28 

29 

.129 

.260 

.396 

537 

.690 

869 


1 297 

1.048 

1.944 

2 274 


29 

30 

129 

.260 

.396 

.837 

.690 

.889 


1.296 

1.648 

1.944 

2.276 

2.463 

30 

31 

.129 

.260 

396 

836 

.689 

888 

1 064 

1 296 

1.848 

1.946 

2.277 

2.496 

31 

32 

.129 


.304 

.636 

689 

.888 

1 053 

1 296 

1 648 

1.946 

2 279 

2 498 

32 

CO 

1 .12866 



li 

0744 fl 

84162 

1 03643 

80 

CO 

ec 

1 64488 

1 05996 

2 32634 

2 57682 

CO 


Note: r^p^y,k) m ^ 


Further, let ni + 1 = ^la + 1 = iSiCa; — be the sum of squared 

deviations in the first sub-sample and similarly S 2 (x — xa)® be that for the 
second. Then Fisher has shown* that the generalized 

(rai + 1) (wa H - 1) 

71 % “ t * Wa “h 2 

is distributed as before for n = ni-\- n^. Obviously, 

JV-x = (wi + l)xi + (na + l)xa , 


(13) 


(xa - Xi) Vrii + 


na 


V)Si(x — Xi)* 4- 'Sa(x — Xa)* 




whence 
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(14) 

and 


5 = (^1 + 1) (^2 - ^i) ^ jni + 1) (a - Xi) 

N fl2 + 1 


(15) 


Siix ~ xiY + Siix ~ XiY = . s* - (m -I- 1) (xi - x)2 - (nj + 1 ) % _ x)^ 


= N 



rij + 1 

ni + 1 


I 


whence 


(16) ‘= Wn + 2-t-k-r> ’ = » - 2 ) 

i.e., t = Vn • tan 6, Vn 4- 2 - fc • sin 6 = Vfc . t. 


In connection with analysis of variance where the total sample may he divided 
into several subsets of observations, the generalized t may be used, accordingly, 
to indicate in a simple manner which (if any) of the means of subsets differ 
significantly from the general mean where the equivalent i-test is applicable. 

In general let g 0 be a number such that P is the probability that 

/r/ > r(p,N,k); where, as above, N is the total number of observations in the 
whole sample, k is the number of these in the subsample and t is defined by (12). 
Then by (16), obviously, 


(17) r(P.jv.fc) = ^ • 

In ToWe 11 are given values of r(p,w,i) for a range of values of the arguments, JV 
and P. The critical values of t in Table 1 are simply values of this function for 
P = <^/N where </> is taken as parameter, i,e., T{^iN,if,i)- 
Rider® has given an inleresting review of rejection criteria previously proposed. 
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ON CERTAIN COEFFICIENTS USED IN MATHEMATICAL STATISTICS 
By Everett H. Lakqtjiee, S.J. 

I. Introduction 

(1.1) We have studied here certain coefficients arising in interpolation, numeri- 
cal differentiation and integration formulas in order to establish explicit expan- 
sions for these coefficients in the form of a finite summation. Ordinarily they 
are obtained by means of recursion relations, which necessarily demand the 
building up of a complete table in order to find the desired set of coefficients. By 
using the methods described in this paper, we are able to calculate any desired 
set independent of the ones which precede it in the table. In the literature we 
find two other expansions of the difference quotients of zero, one by Jeffery^ and 
one by Boole.* Our expansion for the differential quotients of zero is the same as 
one obtained by Jeffery,® however the proof is more elementary and simple. 

The Bernoulli numbers also find a wide range of application in many finite 
integration formulas, and hence our attention was drawn to the discussion of 
certain coefficients which occur in the study of these functions.^ As in the cases 
mentioned above these coefficients are likewise ordinarily obtained by recursion 
formulas, but by our expansions they may be obtained directly. 


II. Difference Quotients of Zero 

(2.1) It is our purpose here to show that this difference quotient of zero, 
may be expressed by the following summation : , 





( 1 ) 


where Oi, oj, • ■ ■ , ttm-i «= 0, 1, 2, • • • , n — m and oi ^ a* ^ ^ ^ 0- 

Obviously the number of terms in the summation is the number of combina- 
tions oln-m + I things taken m — 1 together where repetitions are allowed. 

(2.2) By means of the recursion relation® 


A^O" = TO A^O""^ -f- TO (2) 


1 Henry M. Jeffery, “On a method of expressing the combinations and homogeneous 
products of numbers and their powers by means of differences of nothing.” Quarterly 
J ournal of Pure and AjipUedMathematice, vol. 4 (1861), pp. 364 ff. 

* George Boole, A Treatise on the Calculus of Finite Differences, (Stechert, N. Y.), p. 20. 
' Loo. cit. 

* Steffensen, Interpolation (Williams & Wilkins, Baltimore), p. 125. 

‘ L. M. Milne-Thompson, Calculus of Finite Differences, (Macmillan), p. 36, sec. 2.53, (2). 
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we are able to build up a table of values. By substitution it can be shown that 
(]) satisfies the values of this table except when to = 0, 1 and for m > n, for 
then the summation becomes meaningless. We therefore define the summation 
to have the value 0 for to = 0 , n > 0 and for m > n, and the value 1 for m = 1. 
We exhibit one substitution below. When to = 3 and n = 4, 


= 3! 



.2 


i) +V2 



1 


+ I o 



I /2V 
1 


= 36 . 


(2.3) Taking (2), we proceed by repeated application of the recursion formula 
and finally we have 

n— 1 

A'"0" = TO"-”* A^O"* + X) , 

d =»» 


which since A^O” = to!/ becomes 


A^O”* = m""** A^-'O** 


C3) 


We will now prove (1). Proceeding by induction we assume (1) true for 
TO — 1. Hence from (3) we have 

A«o» = TO"-”* to! + 2 (w - 1)! 2 * • • • (1)"** (lY* > 

A »m " ■' \ / \ / 


where ai, aa, • • • , a^-a = 0, 1, 2, • • • , d — to + 2 and oi ^ Ca & ■ • ■ ^ 
Om-a ^ 0. This becomes 


A’'*0" = TO"-™ to! 


<l = m ' 

Using the symbol 2S for the double summation of (4), we may write 

ss = I {(^T ■ ■ ■ (i)’(i)" + ■ ■ ■ (!)"©' 


(4) 


/to - iY-”‘/to - 2Y-™+i /3Y~”‘''V^\ 

+ V^rir2y ^TO - 3/ ■ ■ ■ V2/ \i/ 

(m - lY-”*^^ /2\ 

Vto - 2^ ■ ■ ■ 


d-m+l 


Vd— m+1 


+ 
n 1 




* Milne-Thompson, loc. cit. 
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+ 

+ 

+ 


Now, OT'*-’" = 


( m fm _ jy-*"-! fmj- ^y--" (2\- 

\»i -T/ \m — 2/ \m~ 3/ \,1/ 


f m y-‘*-yw - ly-" / 2 y- 
V^r=~iy W - 2/ “ • \i) 


from ?» to n — 1. Hence by including ot”~” under the summation we are 
able to replace the double summation by a single one and have 


A»0" = m\ 


^ ^ Y”'”' /to — 1 

^ \m^) \m^) 




where oi, a, - , a„_i = 0, 1, 2, • • • , n — m and oi ^ os S ^ Om-i g 0. 

Hence (1) is proved.’' 


III. Differential Quotients of Zero 

(3.1) In Markoff’s formula for numerical differentiation we meet coefhcients 

of the type We will show here that this differential quotient of zero 

may be expressed by the following finite sum: 

j)mQM _ (_ 23 iplPi • • • p„-m) (6) 

where pi> Pi> • > Pn-m > 0 take on values from 1, 2, • • • ,n— 1. Obvi- 
ously the number of terms in the expansion will be the same as the number of 
combinations of n — 1 things taken n — to together without repetitions. 

(3.2) By means of the recursion formula® 

2 )moct.) = (1 — n) + TO (6) 

we are able to buUd up a table of values. By substitution it can easily be shown 
that (5) satisfies the values of the table when n > to > 0. For the other values 
the summation is meaningless, hence we define it to have the value 1 for 
TO = n > 0; and the value 0 for to > n and to = 0. When to = 2 and n = 4, 
we have 


= (- l)‘-*2! {(3-2) 4- (3-1) -f (2.1)} = 22 , 
which is the same value as found by (6). 


' Our expansion may be shown to be equal to that of .Teffcry’s cited in the introduction, 
which is A”K)”+'‘ = m I where {"O"'’'" expresses tlie sum of all the homogeneous products 

of n dimensions which can be formed by the first m natural numbers and their powers. The 
proof of Jeffery’s expansion involves the use of complicated symbolic operators, while our 
proof uses elementary notions only. 

* Steffensen, op. cit., p. 57, 68, (12) and (14). 
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(3.3) Returning to (6), we obtain by its repeated application: 

tJ— m— 1 

= (- I)"-” {n - DmQW + m X (- 1)“ in - l)w Dm-ioc>.^.-y 

a =0 

or, since = ml, 

DmQin) _ (_ l)n-m (^ _ m! -|- OT ^ (— 1)“ (w — 1)(“) i)m-10("-a-l) (7) 

a — 0 

In proving (6), we proceed by induction, assuming (5) true for m - 1; hence 
by (7) we have 

J)mQ(n) _ (_ l)n-m („ _ 

n—r^l /q'\ 

+ ml (- I)”-™ (n - !)'»> • • ■ Pn-m-a) 

a “"0 

where pi > pt > ■ ■ ■ > Pti— *fn— a > 0 take the values 1, 2, • ■ • , w — a — 2. 
Expanding the double sum of (8) we have 

_ n ^ 2 n — a 

232 = 2 (Pl • • • Pn-m) + 2 (n - 1) (pi . • • Pn-,n-l) 

J>,“1 p,-l 

+ 2 - 1) (« - 2) (pi . . . p„_„,_2) (9) 

+ • • • + 2 - 1) (« - 2) ■ . • (m + 1) (pi) 

P,"'! 

in which pi > pj > • • ■ > p, > 0 always holds, where 
s = n — m, n — m — 1, ■■■,2, 1 

in turn. 

Upon inspection, it is evident that (9) contains all the terms of (5) with the 
exception of (n — 1) (n — 2) . . . (m + l)m. Hence, since by definition 
(n — = (n — 1) . . . (m -f l)m, we may include the first term on the 

right-hand side of (8) under the summation and then we have proved (5).’ 

rV. The Coefificient 

(4.1) In discussing the Bernoulli numbers and the Bernoulli polynomials, 
Steffensen“ makes use of the relation: 


BM = (- D' 2 

n «=0 


( 10 ) 


• JefEery’sexpansion referred to in the introduction ■“ where — 

expresses the sum of the combinations of the first n — 1 natural numbers taken n — m 
together. The remarks made above under article 2.3 concerning symbolic operators also 
apply here mutatis mutandis. 

i^Op. oit., p. 125, (24); of. also Jacobi’s theorem. Journal fUr reins und angewandie 
Mathematik (Crelle’s Journal), vol. 12, pp. 268-269. 
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where z = x — We wish here to show that the coefficient ordinarily 
found by means of recursion formulas, may be obtained from the following 
summation: 

r-n + l JV»+1 m+1 

= (2r)<“’‘> i: [W„] S £ [Wi] (11) 

Nn’-S Jfn-t-S 

where [iV] = (iV)®/(22V)'*^ Obviously the summation has no meaning for 
n = 0, nor for r < n + 2. Therefore it will be necessary to make definitions 
or devise other schemes for meeting this difficulty. 

Steffensen“ shows that 

G['^ = 1 for r S 0 ; G^r-i = 0 for r > 1 ; (12) 

and likewise he gives the following recursion relation: 

(2r - 2n)® Gi'^ = (27-)® + (r - n + 1)® Gl'-J, . (13) 

In accordance with (12), we define the sum of (11) to be equal to 1 for ra = 0, 
and to be equal to 0 for « = r — 1, when r > 1. By means of the recursion 
formula (13), Steffensen** gives a table of values of Crl,'\ which (11) may be 
easily shown to satisfy. From this table we have the value G** ^ = 10. Using 
this as an example of the expansion, we have by (11) : 

4 ATl + l ATj+l 

= (12)<« Z r m z 

Wi-3 J^l-8 Afi-S 

= (12)'«<[3]{[4]([5] + [4] + [3]) + [3]([4] + [3])j 

+ [4]([6]([6] + [5] + [4] + [3]) + [4]([5] + [4] + [3]) + [3]([4] + [3])}> 

= 10 . 

(4.2) Before proving the general case, we will prove by induction that 

= (2r)® Z [^il (14-) 

Ni—Z 

Assuming (14) true for r — 1, we have by (12) and (13) 

(?(/) = (2r)® Z [^i] + (2r)® [r] = (2r)® Z • 

, J\ri-3 yi-3 


Hence (14) is valid. 

(4.3) We shall prove (11) with respect to r. By repeated application of (13), 
we have 


u Op. cit., p. 126 . 
** Op. cit., p. 126 . 
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{(2r)®/(2r-2n)®jGL' {(2r)®(r - n + l)®/(2r - 2n + 2)f«l G'„’r/ ’ 

+ { (2r)® (r - n 4- l)®(r - n + 2)®/(2?’ - 2?i + ) G'/r/' + . . . 

+ {(2r)®(?; - 71 + ■ ■ ■ (r - l)®/(2r - 2)‘=“">) 

+ {(r-n+ D® . . . (r)»>/(2r - 2)<^”^} (?';> 

r-n ATs + l 

= (2r)«"’ S [iV„] . . ■ E [iVj 

^n-3 Ari“3 

r ~n + l A ft + 1 

4- (2r-)'*"^ [r — 71 4- 1] 2 [^n-i] • • • 52 [-^il 

J7n-I“3 J7i— 3 

r— n+7 JVj+l 

4- (2r)'*'‘^ [r - 71 4- 1] [r - 71 4- 2] ^ [2V„_s] • • • 52 H 

JVn-«“3 JVl— 3 

+ (20'*"' [r - 71 4- 1] [r - 71 4- 2] ■ • • [r - 1] ^ [JVi] 

;Vj“3 

_j_ (27')'*"' [r — 71 + 1] • . • [r] . 

It is evident from inspection that this is nothing but an expanded form of (11), 
hence (11) is proved with respect to r. 

(4.4) Proceeding in the same way as above to prove induction with respect to 
n, we have again by repeated application of (13) 

(?</' = {(r - 71 + l)<*'/(2r - 2ti)'«{<3l’L\+ {(2r)'«(r - ra)'*'/(2r - 277)'«}(?</_-i'' 
+ {(2r)'«(r - 71 - l)'«/(2r - 27i)'«) 

4 4 _ { (2r) (3) '*V(2r - 277)'*'^*’*-» } Gi'-V' 

r “"n+ 2 y« + l 

= (2r)<*"’ [r - 71 4- 1] E ' S l^il 

Ar„_i-3 ATi-S 

4- (2r)'*'»[r - 7i] "if' [N.-^] ■ ■ ■ "Z [fVi] 

lVft-i-3 NiS 

6 JV* + 1 

4- (2r)»'»[4] X: E [^il 

Ifn-i^Z JVi-3 

4 ATi + l 

4- (2r)'«[3] 52 [iVn-J E m. 

Ifn-i — i n — s 

From this latter equation, (11) follows immediately and therefore the proof is 
complete. 

(4.5) Bernoulli numbers may be expressed in terms of this coefficient Gi ' as is 
shown by Steffensen,^® in the following way 

B,, = (-l)'G'/> (16) 


“ Op. cit., p 125, (27). 
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which we shall express in terms of (11). However as (11) is meaningless for 
n =!= r, we obtain the relation 

(2r + 2) ® G'/ > = - (2) ® G</_V > for r > 0 , (16) 

« 

which follows immediately from (12) and (13), and thereby obviate this difficulty. 
Hence, by (11), (15) and (16), we can write 

8 iVr-l+1 ATj+l 

{(-l)^-»-^>(2r)!/(4)®} D ■ Z[iVJ (17) 

We note here that the definitions of the summation, given in 4.1, likewise hold. 

Saint Lotus Univebsitv 
Saint Louis, Missoubi 



NOTICE OF THE ORGANIZATION OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 

For sometime there has been a feeling that the theory of statistics 'would be 
advanced in the United States by the formation of an organization of those per- 
sons especially interested in the mathematical aspects of the subject. As a con- 
sequence, a meeting of interested persons was arranged for September 12, 1935, at 
Ann Arbor, Michigan. At the meeting, it was decided to form an organization 
to be known as the Institute of Mathematical Statistics. A constitution and 
by-laws were adopted and the following officers elected to serve until December 
31st, 1936: President, H. L. Rietz; Vice-president, W. A. Shewhart; Secretary- 
Treasurer, A. T. Craig. A resolution, instructing the officers to investigate the 
feasibility of the affiliation of the Institute with the American Mathematical 
Society or with the American Statistical Association, was adopted. 

The constitution provides that membership in the Institute shall consist of 
Members, Fellows, Honorary Members, and Sustaining Members, A com- 
mittee on membership will establish qualifications requisite for the different 
grades of membership. The annual dues of members and fellows are five dollars 
a year and these include a year's subscription to the official journal, the Annals 
of Mathematical Statistics. 

The next meeting of the Institute will be held in St. Louis, Missouri, in 
December of this year in connection -with the meetings of the American Associa- 
tion for the Advancement of Science, the American Mathematical Society, and 
other organizations. 

Forms for application for membership in the Institute may be had by writing 
the Secretary-Treasurer at the University of Iowa, Iowa City, Iowa. 
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